[go: up one dir, main page]

HK1176438B - Method and device for shrinking virtual magnetic disk image file - Google Patents

Method and device for shrinking virtual magnetic disk image file Download PDF

Info

Publication number
HK1176438B
HK1176438B HK13103842.9A HK13103842A HK1176438B HK 1176438 B HK1176438 B HK 1176438B HK 13103842 A HK13103842 A HK 13103842A HK 1176438 B HK1176438 B HK 1176438B
Authority
HK
Hong Kong
Prior art keywords
sector
data
garbage
file
bitmap
Prior art date
Application number
HK13103842.9A
Other languages
Chinese (zh)
Other versions
HK1176438A1 (en
Inventor
宋振华
陈伟才
王倩
万佳
Original Assignee
阿里巴巴集团控股有限公司
Filing date
Publication date
Priority claimed from CN201110228838.2A external-priority patent/CN102929884B/en
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of HK1176438A1 publication Critical patent/HK1176438A1/en
Publication of HK1176438B publication Critical patent/HK1176438B/en

Links

Description

Method and device for shrinking virtual disk image file
Technical Field
The present application relates to computer file processing technologies, and in particular, to a method and an apparatus for shrinking a virtual disk image file.
Background
The image file is similar to the ZIP package in that it makes a series of files into a single file according to a certain format, so as to facilitate the user to download and use, such as a test version of the operating system, games, etc. The most important characteristic of the method is that the method can be identified by specific software and can be directly recorded on an optical disc. In fact, the image file in the general sense can be expanded again, and more information can be contained in the image file. Such as system files, boot files, partition table information, etc., so that the image file may contain all the information for a partition or even a hard disk.
VHD (VirtualHardDisk) is an image file format of a virtual disk. The file in the VHD format can be used as a virtual disk to carry out file system formatting and store data of a virtual machine.
VHDs include three types: fixed (fixed) disk images, dynamic (dynamic) disk images, incremental (differentiating) disk images. The size of the fixed disk image file is determined at the time of creation, and the size of the image file is not changed by formatting the file system, writing data, and deleting data on the fixed disk image file. The dynamic disk image file and the incremental disk image file can grow along with the increase of written data, but the size of the image file cannot be reduced after the data is deleted.
For example, a 2GB dynamic disk image file is created, which is initially only ten or more KB in size, and which can grow up to about 2GB as the data written increases, but deleting the data does not reduce its size.
Based on the characteristics, once data is written into the dynamic disk image file and the incremental disk image file, the file size of the dynamic disk image file and the incremental disk image file can be increased, and the file size can not be reduced even if the data is deleted later, so that the deleted data is equivalent to the storage space still occupying the VHD, and the waste of the storage space is caused.
Disclosure of Invention
The application provides a method and a device for shrinking a virtual disk image file, which aim to solve the problem that the file is not reduced even if data is deleted in the conventional virtual disk image file, so that the storage space is wasted.
In order to solve the above problem, the present application discloses a method for shrinking a virtual disk image file, wherein:
the file comprises data blocks and a block allocation table for recording information of each data block;
the method comprises the following steps:
searching for a junk data block in a file, and modifying a corresponding record of the junk data block in a block allocation table, wherein the junk data block is a data block which does not store valid data;
migrating the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks, and modifying corresponding records of the migrated effective data blocks in a block allocation table;
and after the data block migration is completed, truncating the file.
Preferably, each data block includes sectors and a sector bitmap for recording information of each sector; before searching for the garbage data block in the file, the method further comprises: and searching for the junk sectors in the file data blocks, and modifying corresponding records of the junk sectors in a sector bitmap.
Preferably, the searching for the garbage data block in the file includes: and searching a sector bitmap in the file data block, and taking the data block in which the records corresponding to all sectors in the sector bitmap are represented as garbage sectors as a garbage data block.
Preferably, the searching for the garbage sectors in the file data block includes: searching a sector bitmap in the file data block, and if the record corresponding to a certain sector in the sector bitmap indicates that the data is stored in the sector, continuing to judge that: if the file is a dynamic disk image file, judging whether the data stored in the sector is all 0, and if so, taking the sector as a garbage sector; if the file is an incremental disk image file, judging whether the data stored in the sector is consistent with the data at the same sector position in the parent image file, and if so, taking the sector as a garbage sector.
Preferably, the searching for the garbage sectors in the file data block includes: if the file is a dynamic disk image file, acquiring an original sector bitmap of a file data block and an inverted sector bitmap representing whether a corresponding sector stores effective data; comparing each bit of the original sector bitmap and the inverted sector bitmap: and if the record of a certain corresponding sector in the original sector bitmap indicates that the sector stores data, and the corresponding record of the sector in the inverted sector bitmap indicates that the sector does not store data, the sector is regarded as a garbage sector.
Preferably, the searching for the garbage sectors in the file data block includes: if the file is an incremental disk image file, acquiring an original sector bitmap of a file data block, an inverted sector bitmap representing whether a corresponding sector stores valid data or not and a sector bitmap of the same data block in an ancestor image file; comparing each bit of the original sector bitmap, the inverted sector bitmap, and the ancestor mirror sector bitmap: if a record corresponding to a sector in the original sector bitmap indicates that data is stored in the sector, and the corresponding record of the sector in the ancestor mirror sector bitmap indicates that no data is stored in the sector, and the corresponding record of the sector in the inverted sector bitmap indicates that no data is stored in the sector, the sector is treated as a garbage sector.
Preferably, after the data block migration is completed, truncating the file, including: and after the data block migration is finished, forming a useless area equal to the size of all the garbage data blocks at the tail of the file, and cutting off the useless area to obtain the file after cutting off.
The application also provides a device for contracting the virtual disk image file, wherein:
the file comprises data blocks and a block allocation table for recording information of each data block;
the device comprises:
the system comprises a garbage data block searching module, a garbage data block searching module and a garbage data block updating module, wherein the garbage data block searching module is used for searching a garbage data block in a file and modifying a corresponding record of the garbage data block in a block allocation table, and the garbage data block is a data block which does not store valid data;
the data migration module is used for migrating the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks and modifying corresponding records of the migrated effective data blocks in the block allocation table;
and the file truncating module is used for truncating the file after the data block migration is completed.
Preferably, each data block includes sectors and a sector bitmap for recording information of each sector; the device further comprises: and the garbage sector searching module is used for searching garbage sectors in the file data block and modifying corresponding records of the garbage sectors in a sector bitmap.
Preferably, the garbage data block searching module searches a sector bitmap in the file data block, and represents records corresponding to all sectors in the sector bitmap as data blocks of garbage sectors as garbage data blocks.
Preferably, the garbage sector searching module includes:
the sector bitmap searching submodule is used for searching a sector bitmap in the file data block, and if the record corresponding to a certain sector in the sector bitmap indicates that data is stored in the sector, the sector bitmap searching submodule triggers the judgment submodule;
a judgment submodule for making the following judgments:
if the file is a dynamic disk image file, judging whether the data stored in the sector is all 0, and if so, taking the sector as a garbage sector;
if the file is an incremental disk image file, judging whether the data stored in the sector is consistent with the data at the same sector position in the parent image file, and if so, taking the sector as a garbage sector.
Preferably, if the file is a dynamic disk image file, the garbage sector searching module includes:
the first acquisition submodule is used for acquiring an original sector bitmap of a file data block and an inversion sector bitmap which represents whether a corresponding sector stores effective data or not;
a first comparison submodule, configured to compare each bit of the original sector bitmap and each bit of the inverted sector bitmap: and if the record of a certain corresponding sector in the original sector bitmap indicates that the sector stores data, and the corresponding record of the sector in the inverted sector bitmap indicates that the sector does not store data, the sector is regarded as a garbage sector.
Preferably, if the file is an incremental disk image file, the garbage sector searching module includes:
the second acquisition submodule is used for acquiring an original sector bitmap of a file data block, an inversion sector bitmap which indicates whether a corresponding sector stores effective data or not and a sector bitmap of the same data block in an ancestor mirror image file;
a second comparison submodule for comparing each bit of the original sector bitmap, the inverted sector bitmap and the ancestor mirror sector bitmap: if a record corresponding to a sector in the original sector bitmap indicates that data is stored in the sector, and the corresponding record of the sector in the ancestor mirror sector bitmap indicates that no data is stored in the sector, and the corresponding record of the sector in the inverted sector bitmap indicates that no data is stored in the sector, the sector is treated as a garbage sector.
Compared with the prior art, the method has the following advantages:
firstly, the method searches the garbage data blocks in the virtual disk image file, migrates the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks, forms a useless area at the end of the file, and finally cuts off the useless area, so that the garbage data blocks in the file can be removed, and the size of the file is reduced. According to the method and the device, after the dynamic disk image file and the incremental disk image file in the VHD format delete some data, the size of the image file can be reduced, and therefore the storage space required by the file is saved.
Secondly, in the process of searching the garbage data blocks, the method for rapidly calculating the garbage sectors is provided, which sectors in the data blocks are the garbage sectors can be more rapidly identified, and then the data blocks of which all the sectors are the garbage sectors are used as the garbage data blocks.
Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for practicing the present application.
Drawings
FIG. 1 is a diagram illustrating a format of a VHD image file according to an embodiment of the present application;
FIG. 2 is a schematic diagram of BAT in an embodiment of the present application;
FIG. 3 is a flowchart of a method for shrinking a virtual disk image file according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for shrinking a virtual disk image file according to another embodiment of the present application;
FIG. 5 is a schematic diagram of a BAT after search of a garbage block in the embodiment shown in FIG. 2;
FIG. 6 is a schematic view of BAT after data migration in the embodiment of FIG. 5;
FIG. 7 is a schematic diagram of a BAT after file truncation according to the embodiment of FIG. 6;
fig. 8 is a structural diagram of an apparatus for shrinking a virtual disk image file according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
The application provides a method for clearing junk data in a virtual disk image file and further shrinking the file, so that after some data in the image file are deleted, the size of the image file can be reduced by applying the method, and the storage space required by the file is saved.
The junk data refers to data which is generated in the running process of the virtual machine, deleted later and not needed any more, but still exists in the virtual disk image file.
The following describes the implementation process of the method in detail by taking an image file in a VHD format as an example.
The format of the VHD image file is first introduced.
Referring to fig. 1, a schematic diagram of a format of a VHD image file in an embodiment of the present application is shown. A VHD image file typically includes the following parts: footer (bottom information), header (header information), BAT (block allocation table), and datablock (data block). The detailed description of each section is as follows:
1)footer
the footer records the information of the start sector of the header, the file size, the image type (dynamic image or incremental image), and the like. The footer is located at the end of the file and is written to the end of the file with a move backward whenever new data is written to the image file. Also, since the footer is the most important information for the VHD file, a backup copy of the footer is preserved at the very beginning of the VHD.
2)header
The header records information of the start sector of the BAT, the number of entries in the BAT (i.e., how many datablocks are possible at most), the size of the DataBlock, etc. In addition, if the file is an incremental image file, the location information of its parent disk (i.e., the parent image file) is also recorded.
3)BAT
Each entry in BAT records the starting sector of the corresponding datablock, and if an entry does not have a corresponding datablock, it is recorded as 0xFFFFFFFF (indicating that a datablock is not allocated yet).
4)datablock
Each datablock consists of a sector bitmap (sectorinitmap) followed by data regions (sectors).
For the dynamic VHD, a certain bit (bit) in the sector bitmap is 0, which indicates that no data is stored in the corresponding sector (sector); if a bit (bit) is 1, it indicates that the corresponding sector (sector) holds data, but if the data of the sector is deleted, the bit is not reset and still is 1. In other words, if the data in a certain sector is not modified after the datablock is allocated to the VHD file, the bit in the corresponding bitmap is 0; otherwise it is 1 as long as it is modified.
For the incremental VHD, a bit is 1, which indicates that valid data is stored in the sector of the incremental disk itself; if 0, it means that the incremental disk itself does not hold valid data in the sector, and it needs to read data from the corresponding sector in its parent disk. If the parent disk is also an incremental disk, then a further check is made to see if data needs to be read from the parent disk of the parent disk until a non-incremental (fixed-size or dynamic) ancestor disk is reached.
All datablocks in the same image file are the same size and are recorded in the header.
Based on the image file format shown in fig. 1, when a dynamic image file or an incremental image file is created, the newly created image file only contains parts such as a footer, a header, and a BAT, and a datablock is not allocated. When data is to be written, several datablocks are allocated to the data for saving the data, and the datablock start sector recorded in the corresponding entry in the BAT is updated.
When data saved in a VHD image file is to be read, the logical sector number to be read (relative to the logical sector number of the saved data, not relative to the physical sector number of the VHD image file) needs to be converted to the physical sector number relative to the VHD image file. The conversion method comprises the following steps:
BlockNumber=LogicalSectorNumber/SectorsPerBlock
where the symbol "/" indicates a division by a whole, such as "5/3 ═ 1";
SectorInBlock=LogicalSectorNumber%SectorsPerBlock
wherein the symbol "%" indicates the remainder of division, e.g., "5% 3 ═ 2";
PhysicalSectorLocation=BAT[BlockNumber]+BlockBitmapSectorCount+SectorInBlock
wherein:
logistic sector number: a logical sector number;
block number: the datablock number of the cell;
sectors perblock: the number of sectors included in each datablock;
SectorInBlock: sector number in the datablock in which it resides;
physical sectorlocation: a physical sector number for the VHD mirror image file;
blockabitmap sectorcount: number of sectors of the sector bitmap.
It should be noted that, as seen by a virtual machine user, a VHD image file may be deployed as a disk partition and used by a certain file system. When data is written, the data may be written to different physical areas of the file system according to the data write strategy of different file systems, and in most cases, the data is not written from the starting position of the file system to the back (i.e. from the smaller part of the sector number relative to the starting position to the larger part). The order of the datablocks is usually different from the order of the sizes of their respective start sector numbers for VHD files.
Assuming that each datablock is 2MB in size, as in fig. 1, if a virtual machine user were to initially write a few KB of data at the 3 rd MB, the system would first assign datablock1 to the VHD file, and not assign datablock0 to the VHD file.
For another example, fig. 2 is a schematic diagram of BAT in an embodiment of the present application.
For a certain VHD file which can have 16 datablocks at most, sequence numbers DB 0-DB 15 of the 16 datablocks and a starting sector number of each datablock are recorded in a certain time BAT, and the starting sector number is the relative position of each datablock in the VHD file. Wherein, the datablock with the starting sector number of 0xFFFFFFFF represents that the datablock is not allocated yet.
As can be seen from FIG. 2, the size of the start sector number of each datablock in BAT does not increase in the order of DB 0-DB 15.
Based on the above, the following analyzes why deleting data does not reduce the VHD file in the prior art.
In the prior art, when the datablock corresponding to the sector to be written is not allocated (recorded as 0 xffffffffff in the corresponding BAT entry), a datablock is newly allocated (recorded as the start sector number of the newly allocated datablock in the corresponding BAT entry), and the bit corresponding to the sector to be written is modified to 1 in the sector bitmap of the newly allocated datablock. However, if some data is deleted, even if the data in a certain datablock is no longer valid, the corresponding bit in the sector bitmap of the datablock will not be reset, and still be 1, that is, the datablock is still in the allocated state, so that the prior art cannot recover the storage space occupied by the datablock to store other data, thereby causing the waste of the storage space.
For this reason, to save the space occupied by the VHD file, the file should be able to be truncated. Before the file is truncated, the data at the position of the file offset is migrated to the position for storing the garbage data, so that the file forms a useless area at last, and the useless area is truncated at last.
Based on the thought, the application provides a method for shrinking the virtual disk image file, which can effectively remove the junk data in the file, thereby achieving the purpose of shrinking the file.
The contents of the present application will be described in detail below with reference to examples.
Fig. 3 is a flowchart of a method for shrinking a virtual disk image file according to an embodiment of the present application.
Still taking the contraction of the VHD image file as an example, the steps are as follows:
step 301, finding a garbage data block in a file, and modifying a corresponding record of the garbage data block in a block allocation table, wherein the garbage data block is a data block which does not store valid data;
the datablock in the file is basically divided into a garbage data block and an effective data block, wherein the effective data block is the datablock storing the effective data, and the garbage data block is the datablock with all data in the block deleted, namely the data in the datablock is not effective any more.
After finding the garbage datablock, the starting sector number value recorded by the garbage datablock in the block allocation table BAT is 0xFFFFFFFF, that is, the logic data block corresponding to the datablock does not store data any more.
Step 302, migrating an effective data block behind the garbage data block to the position of the garbage data block, and modifying a corresponding record of the migrated effective data block in a block allocation table;
the "back" refers to that in the physical storage area of the file, the valid data block at the biased position of the file is migrated to the position of the garbage data block, and the start sector number corresponding to the valid data block in the block allocation table BAT is modified to the original start sector number of the garbage data block.
The "migration" refers specifically to a copying process of copying data in valid data blocks located after the garbage data block to a position of the garbage data block.
Step 303, truncating the file after the data block migration is completed.
The purpose of the migration is to move the valid data blocks forward to the front of the file as much as possible, so that a useless area is formed at the end of the file, and finally the length of the file can be reduced by cutting off the useless area.
In order to make the content of the embodiment shown in fig. 3 more comprehensible to a person skilled in the art, a more detailed description is given below by means of another preferred embodiment shown in fig. 4.
Referring to fig. 4, a flowchart of a method for shrinking a virtual disk image file according to another embodiment of the present application is shown. Still take VHD files as an example:
step 401, finding garbage sectors in file data blocks;
the specific search method is as follows:
for each sector in each datablock, searching a sector bitmap in the datablock, and if a record corresponding to a certain sector in the sector bitmap indicates that data is stored in the sector, for example, a certain bit in the sector bitmap is 1, indicating that valid data may be stored in the corresponding sector, or data may have been deleted, so that the following judgments need to be carried out:
1) if the file is a dynamic disk image file, judging whether the data stored in the sector is all 0, and if so, taking the sector as a garbage sector;
2) and if the file increment disk image file is the file increment disk image file, judging whether the data stored in the sector is consistent with the data at the same sector position in the parent image file, and if so, taking the sector as a garbage sector.
Step 402, modifying the corresponding record of the garbage sector in a sector bitmap;
after finding the garbage sector, the corresponding bit of the garbage sector in the sector bitmap is cleared to 0, that is, the corresponding sector does not store data.
Step 403, finding garbage data blocks in the file;
the specific search method is as follows:
and searching a sector bitmap in the file data block, and taking the data block in which the records corresponding to all sectors in the sector bitmap are represented as garbage sectors as a garbage data block.
For each datablock, if a datablock is used (the corresponding entry in BAT is recorded with a value other than 0xFFFFFFFF), but after the above two steps, the sector bitmap of the datablock is all 0, then the datablock can be regarded as a garbage datablock because valid data is no longer stored therein.
However, if some sectors have corresponding bits of 0 and some sectors have corresponding bits of 1 in a sector bitmap of a datablock, the datablock is not a garbage datablock and is still a valid data block.
Step 404, modifying the corresponding record of the garbage data block in the block allocation table;
in BAT, the start sector record corresponding to garbage datablock is set to 0 xFFFFFFFF.
For example, in fig. 2, three datablock DBs 5, DB13 and DB14 with start sectors of 12327, 24639 and 28743 respectively, and if found to be garbage datablock through the above three steps, their BAT entries are set to 0 xffffffffff, as shown in fig. 5.
Step 405, data migration;
the valid datablock (source location) located behind the garbage datablock (target location) is copied (including the sector bitmap and data) to the garbage datablock, and the starting sector of the migrated valid datablock in the BAT is updated.
For example, in FIG. 5, the valid datablock with start sector 36951(DB8) can be migrated into the area with start sector 12327, and the start sector of the corresponding BAT [8] is changed to 12327; the valid datablock with the start sector 32847(DB7) is migrated to the area with the start sector 24639, the start sector of the corresponding BAT [7] is changed to 24639, and finally the file as shown in FIG. 6 is formed. Thus, at the end of the file, datablock with start sector 28743 is still garbage datablock, and the data in datablocks with start sectors 32847 and 36951 are copied into datablocks with start sectors 24639 and 12327, so datablocks with start sectors 32847 and 36951 are not necessarily reserved for the VHD file, these three data blocks constituting a useless region.
Of course, other migration methods are possible, but the general purpose is: the end of the file is made to form a useless area for truncation.
At step 406, the file is truncated.
After the data block migration is completed, that is, valid data is saved again at the garbage datablock found in step 403, or no valid datablock can be copied later, a useless region is formed at the end of the file, and the useless region is cut off, so that the file after being cut off can be obtained. The truncated size may be at most the size of the found garbage datablock, i.e. truncating the three data blocks at the end of the file as shown in fig. 7.
In summary, through the processing of the above steps, the garbage data is cleared, so that after the dynamic disk image file and the incremental disk image file in the VHD format delete some data, the size of the image file can be reduced, thereby saving the storage space required by the file.
In addition, in the process of searching for the garbage sectors, except that the method is adopted to respectively read sector bitmaps from the local disk (dynamic) or the parent disk (incremental) in sequence for comparison, and further judge whether the sectors are the garbage sectors, the application also provides a method for rapidly calculating the garbage sectors, which sectors in the data blocks are the garbage sectors can be more rapidly identified, and then the data blocks of which all the sectors are the garbage sectors are taken as the garbage data blocks.
Still taking the VHD image file as an example, the fast identification method is described in detail below.
1. For a delta disk image file:
first, for each datablock, three sectorinitmaps (sector bitmaps) are obtained, which are:
1) sourcebitmap, abbreviated src _ bmp, original sector bitmap: sectorinitmap of the datablock in the VHD;
2) parentbbitmap, abbreviated prt _ bmp, ancestor mirror sector bitmap: reading sectorinitmap of the datablock in the parent mirror image, and if a certain bit is 1, the bit is 1; otherwise, the parent mirror image of the parent mirror image is continuously traced until the ancestral mirror image, and when the ancestral mirror image is read, the bit is 0, and is 0.
3) blockbitmap, abbreviated blk _ bmp, inverted sector bitmap: and according to a file system established on the VHD mirror image file by the virtual machine user, inverting the obtained sectorinitmap to show whether the sector on the position really stores effective information or not.
In a typical file system, bitmap information of data is recorded in some way, which sectors hold valid information and which sectors do not hold valid information. For example, ext2/3/4 commonly used in Linux records which blocks store valid data through a bitmap block in each block group; further, like NTFS, which is commonly used in Windows, Bitmap information of a block is recorded by a $ Bitmap file. Wherein a block includes a number of sectors.
After obtaining the sector bitmap information in the file system, it is possible to obtain which sectors in the VHD file hold valid information by the above-described conversion method between the logical sector number and the physical sector number, thereby obtaining blk _ bmp.
Then, comparing the three bitmaps, and generating a final target bitmap dst _ bmp as the sectorinitmap of the datablock in the VHD after the corresponding bit of the garbage sector in the bitmap is cleared to 0 (i.e., after the step 402). For each bit:
ifsrc_bmp=0,thendst_bmp=0;
ifsrc_bmp=1,then
ifprt_bmp=0andblk_bmp=1,thendst_bmp=1;
v indicates that data is newly written based on/based on the status of the parent disk
ifprt_bmp=1andblk_bmp=0,thendst_bmp=1;
V indicates that data was deleted relative to the parent disk
ifprt_bmp=0andblk_bmp=0,thendst_bmp=0;
V indicates that the newly written data is newly written with respect to the parent disk, but that the newly written data is eventually deleted
And if prt _ bmp is 1andblk _ bmp is 1, which indicates that modification occurs relative to the parent disc, reading data from the mirror image and the parent image respectively and comparing the data, if the data are consistent, setting dst _ bmp to be 0, and if the data are inconsistent, setting dst _ bmp to be 1.
As can be seen from the above, the third judgment: and if the record of a corresponding sector in the original sector bitmap indicates that data is stored in the sector (src _ bmp ═ 1), and the corresponding record of the sector in the ancestor mirror sector bitmap indicates that no data is stored in the sector (prt _ bmp ═ 0), and the corresponding record of the sector in the inverted sector bitmap indicates that no data is stored in the sector (blk _ bmp ═ 0), the sector is regarded as a garbage sector.
In another case, it may be determined as a garbage sector, that is, in the last determination, if src _ bmp is equal to 1and prt _ bmp is equal to 1and mblk _ bmp is equal to 1, data is read from the local image and the parent image respectively and compared, and if the data matches, dst _ bmp is set to 0, and the sector may be regarded as a garbage sector.
2. For a dynamic disk image file, it is relatively simple, as follows:
first, for each datablock, sourcebitmap and blockbitmap are obtained without obtaining parentbit.
Then, the two bitmaps are compared to generate a final target bitmap dst _ bmp as the sectorinitmap after the step 402. For each bit:
ifsrc_bmp=0,thendst_bmp=0;
ifsrc_bmp=1,then
ifblk_bmp=1,thendst_bmp=1;
v. indicates that newly written data
ifblk_bmp=0,thendst_bmp=0;
V indicates that newly written data is newly written, but the newly written data is finally deleted
As can be seen from the above, if the record of a corresponding sector in the original sector bitmap indicates that data is stored in the sector (src _ bmp ═ 1), and the corresponding record of the sector in the inverted sector bitmap indicates that no data is stored in the sector (blk _ bmp ═ 0), the sector is regarded as a garbage sector.
The embodiment is described by taking the image file in the VHD format as an example, but the embodiment can also be applied to virtual disk image files in other formats in specific applications, and the implementation principle is similar to the embodiment, so that the detailed description is omitted.
It should be noted that the foregoing method embodiments are described as a series of acts or combinations for simplicity in explanation, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Based on the description of the above method embodiment, the present application further provides a corresponding apparatus embodiment for shrinking the virtual disk image file, so as to implement the content described in the above method embodiment.
Fig. 8 is a structural diagram of an apparatus for shrinking a virtual disk image file according to an embodiment of the present application.
The means for shrinking the virtual disk image file may include a garbage data block searching module 81, a data migration module 82, and a file truncating module 83, wherein,
a garbage data block searching module 81, configured to search a garbage data block in a file, and modify a corresponding record of the garbage data block in a block allocation table, where the garbage data block is a data block that does not store valid data;
the data migration module 82 is configured to migrate an effective data block located behind the garbage data block to the position of the garbage data block, and modify a corresponding record of the migrated effective data block in the block allocation table; wherein the migration refers to a copy operation;
and the file truncating module 83 is configured to truncate the file after the data block migration is completed.
Preferably, since each data block includes sectors and a sector bitmap for recording information of each sector, the apparatus may further include the following modules:
and a garbage sector searching module 84, configured to search a garbage sector in the file data block, and modify a corresponding record of the garbage sector in the sector bitmap.
Further preferably, based on the garbage sector searching module 84, the garbage data block searching module 81 may search a sector bitmap in the file data block, so that a data block in which records corresponding to all sectors in the sector bitmap are represented as garbage sectors is used as a garbage data block.
For the garbage sector searching module 84, the search for garbage sectors can be performed by two methods. For the first search method, that is, sequentially reading sector bitmaps from the local disk (dynamic) or the parent disk (incremental) for comparison, and further determining whether the sector is a garbage sector, the corresponding garbage sector search module 84 may specifically include the following sub-modules:
the sector bitmap searching submodule is used for searching a sector bitmap in the file data block, and if the record corresponding to a certain sector in the sector bitmap indicates that data is stored in the sector, the sector bitmap searching submodule triggers the judgment submodule;
a judgment submodule for making the following judgments:
if the file is a dynamic disk image file, judging whether the data stored in the sector is all 0, and if so, taking the sector as a garbage sector;
if the file is an incremental disk image file, judging whether the data stored in the sector is consistent with the data at the same sector position in the parent image file, and if so, taking the sector as a garbage sector.
For the second fast search method, the corresponding garbage sector search module 84 may further include the following sub-modules:
1) if the file is a dynamic disk image file, the garbage sector searching module 84 may specifically include:
the first acquisition submodule is used for acquiring an original sector bitmap of a file data block and an inversion sector bitmap which represents whether a corresponding sector stores effective data or not;
a first comparison submodule, configured to compare each bit of the original sector bitmap and each bit of the inverted sector bitmap:
and if the record of a certain corresponding sector in the original sector bitmap indicates that the sector stores data, and the corresponding record of the sector in the inverted sector bitmap indicates that the sector does not store data, the sector is regarded as a garbage sector.
2) If the file is an incremental disk image file, the garbage sector searching module 84 may specifically include:
the second acquisition submodule is used for acquiring an original sector bitmap of a file data block, an inversion sector bitmap which indicates whether a corresponding sector stores effective data or not and a sector bitmap of the same data block in an ancestor mirror image file;
a second comparison submodule for comparing each bit of the original sector bitmap, the inverted sector bitmap and the ancestor mirror sector bitmap:
if a record corresponding to a sector in the original sector bitmap indicates that data is stored in the sector, and the corresponding record of the sector in the ancestor mirror sector bitmap indicates that no data is stored in the sector, and the corresponding record of the sector in the inverted sector bitmap indicates that no data is stored in the sector, the sector is treated as a garbage sector.
In summary, the apparatus for shrinking the virtual disk image file according to the above embodiment can reduce the size of the image file after the dynamic disk image file and the incremental disk image file in the VHD format delete some data, so as to save the storage space required by the file.
For the above device embodiment for shrinking the virtual disk image file, since it is basically similar to the method embodiment, the description is relatively simple, and relevant points can be referred to partial description of the method embodiment shown in fig. 1 to fig. 7.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The method and the device for shrinking the virtual disk image file provided by the application are introduced in detail, a specific example is applied in the description to explain the principle and the implementation of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. A method for shrinking a virtual disk image file is characterized in that:
the file comprises data blocks and a block allocation table for recording information of each data block;
the method comprises the following steps:
searching for a junk data block in a file, and modifying a corresponding record of the junk data block in a block allocation table, wherein the junk data block is a data block which does not store valid data, and all data in the junk data block are deleted;
migrating the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks, and modifying corresponding records of the migrated effective data blocks in the block allocation table, wherein migrating the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks comprises copying data in the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks;
after the data block migration is completed, truncating the file;
after the data block migration is completed, truncating the file, including: and after the data block migration is finished, forming a useless area equal to the size of all the garbage data blocks at the tail of the file, and cutting off the useless area to obtain the file after cutting off.
2. The method of claim 1, wherein:
each data block comprises sectors and a sector bitmap for recording information of each sector;
before searching for the garbage data block in the file, the method further comprises:
and searching for the junk sectors in the file data blocks, and modifying corresponding records of the junk sectors in a sector bitmap.
3. The method of claim 2, wherein finding garbage data blocks in the file comprises:
and searching a sector bitmap in the file data block, and taking the data block in which the records corresponding to all sectors in the sector bitmap are represented as garbage sectors as a garbage data block.
4. The method of claim 2, wherein the finding the garbage sectors in the file data block comprises:
searching a sector bitmap in the file data block, and if the record corresponding to a certain sector in the sector bitmap indicates that the data is stored in the sector, continuing to judge that:
if the file is a dynamic disk image file, judging whether the data stored in the sector is all 0, and if so, taking the sector as a garbage sector;
if the file is an incremental disk image file, judging whether the data stored in the sector is consistent with the data at the same sector position in the parent image file, and if so, taking the sector as a garbage sector.
5. The method of claim 2, wherein the finding the garbage sectors in the file data block comprises:
if the file is a dynamic disk image file, acquiring an original sector bitmap of a file data block and an inverted sector bitmap representing whether a corresponding sector stores effective data;
comparing each bit of the original sector bitmap and the inverted sector bitmap:
and if the record of a certain corresponding sector in the original sector bitmap indicates that the sector stores data, and the corresponding record of the sector in the inverted sector bitmap indicates that the sector does not store data, the sector is regarded as a garbage sector.
6. The method of claim 2, wherein the finding the garbage sectors in the file data block comprises:
if the file is an incremental disk image file, acquiring an original sector bitmap of a file data block, an inverted sector bitmap representing whether a corresponding sector stores valid data or not and a sector bitmap of the same data block in an ancestor image file;
comparing each bit of the original sector bitmap, the inverted sector bitmap, and the ancestor mirror sector bitmap:
if a record corresponding to a sector in the original sector bitmap indicates that data is stored in the sector, and the corresponding record of the sector in the ancestor mirror sector bitmap indicates that no data is stored in the sector, and the corresponding record of the sector in the inverted sector bitmap indicates that no data is stored in the sector, the sector is treated as a garbage sector.
7. An apparatus for shrinking a virtual disk image file, comprising:
the file comprises data blocks and a block allocation table for recording information of each data block;
the device comprises:
the system comprises a garbage data block searching module, a garbage data block searching module and a garbage data block updating module, wherein the garbage data block searching module is used for searching a garbage data block in a file and modifying a corresponding record of the garbage data block in a block allocation table, the garbage data block is a data block which does not store valid data, and all data in the garbage data block are deleted;
the data migration module is used for migrating the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks and modifying corresponding records of the migrated effective data blocks in the block allocation table, wherein the step of migrating the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks comprises the step of copying data in the effective data blocks behind the garbage data blocks to the positions of the garbage data blocks;
and the file truncation module is used for forming a useless area with the size equal to that of all the garbage data blocks at the tail of the file after the data block migration is finished, and truncating the useless area to obtain the truncated file.
8. The apparatus of claim 7, wherein:
each data block comprises sectors and a sector bitmap for recording information of each sector;
the device further comprises:
and the garbage sector searching module is used for searching garbage sectors in the file data block and modifying corresponding records of the garbage sectors in a sector bitmap.
9. The apparatus of claim 8, wherein:
the garbage data block searching module searches a sector bitmap in the file data block, and takes the data block, of which the records corresponding to all sectors in the sector bitmap are represented as garbage sectors, as a garbage data block.
10. The apparatus of claim 8, wherein the garbage sector lookup module comprises:
the sector bitmap searching submodule is used for searching a sector bitmap in the file data block, and if the record corresponding to a certain sector in the sector bitmap indicates that data is stored in the sector, the sector bitmap searching submodule triggers the judgment submodule;
a judgment submodule for making the following judgments:
if the file is a dynamic disk image file, judging whether the data stored in the sector is all 0, and if so, taking the sector as a garbage sector;
if the file is an incremental disk image file, judging whether the data stored in the sector is consistent with the data at the same sector position in the parent image file, and if so, taking the sector as a garbage sector.
11. The apparatus of claim 8, wherein if the file is a dynamic disk image file, the garbage sector lookup module comprises:
the first acquisition submodule is used for acquiring an original sector bitmap of a file data block and an inversion sector bitmap which represents whether a corresponding sector stores effective data or not;
a first comparison submodule, configured to compare each bit of the original sector bitmap and each bit of the inverted sector bitmap:
and if the record of a certain corresponding sector in the original sector bitmap indicates that the sector stores data, and the corresponding record of the sector in the inverted sector bitmap indicates that the sector does not store data, the sector is regarded as a garbage sector.
12. The apparatus of claim 8, wherein if the file is a delta disk image file, the garbage sector lookup module comprises:
the second acquisition submodule is used for acquiring an original sector bitmap of a file data block, an inversion sector bitmap which indicates whether a corresponding sector stores effective data or not and a sector bitmap of the same data block in an ancestor mirror image file;
a second comparison submodule for comparing each bit of the original sector bitmap, the inverted sector bitmap and the ancestor mirror sector bitmap:
if a record corresponding to a sector in the original sector bitmap indicates that data is stored in the sector, and the corresponding record of the sector in the ancestor mirror sector bitmap indicates that no data is stored in the sector, and the corresponding record of the sector in the inverted sector bitmap indicates that no data is stored in the sector, the sector is treated as a garbage sector.
HK13103842.9A 2013-03-27 Method and device for shrinking virtual magnetic disk image file HK1176438B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110228838.2A CN102929884B (en) 2011-08-10 2011-08-10 A kind of method and device that shrinks virtual disk image file

Publications (2)

Publication Number Publication Date
HK1176438A1 HK1176438A1 (en) 2013-07-26
HK1176438B true HK1176438B (en) 2017-04-07

Family

ID=

Similar Documents

Publication Publication Date Title
TWI540432B (en) Method and apparatus for collapsing virtual disk mirrors
US11301379B2 (en) Access request processing method and apparatus, and computer device
CN102779180B (en) The operation processing method of data-storage system, data-storage system
US11841826B2 (en) Embedded reference counts for file clones
US11030092B2 (en) Access request processing method and apparatus, and computer system
CN103729262B (en) Operating system heat backup method, device and file system reconstruction method
CN107391774A (en) The rubbish recovering method of JFS based on data de-duplication
CN111007990A (en) Positioning method for quickly positioning data block reference in snapshot system
US20170351608A1 (en) Host device
US6961812B2 (en) Universal disk format volumes with variable size
EP3542273B1 (en) Systems and methods for recovering lost clusters from a mounted volume
JP4159506B2 (en) Hierarchical storage device, recovery method thereof, and recovery program
CN112800007A (en) Directory entry expansion method and system suitable for FAT32 file system
HK1176438B (en) Method and device for shrinking virtual magnetic disk image file
CN114816228B (en) Data processing method, device, server and storage medium
CN119336543B (en) Fingerprint library restoration method, apparatus, computer device, readable storage medium, and program product
US10740015B2 (en) Optimized management of file system metadata within solid state storage devices (SSDs)
CN116303245A (en) A file system snapshot creation method, device, equipment and medium
CN117493294A (en) File system positioning check point method for localization system