CN103218453A

CN103218453A - Method and device for splitting file

Info

Publication number: CN103218453A
Application number: CN2013101549863A
Authority: CN
Inventors: 王卫东; 陈勇; 叶华; 李红梅; 郭小芳; 胡存刚; 宋晓宁
Original assignee: NANJING LONGYUAN MICROELECTRONIC TECHNOLOGY Co Ltd
Current assignee: NANJING LONGYUAN MICROELECTRONIC TECHNOLOGY Co Ltd
Priority date: 2013-04-28
Filing date: 2013-04-28
Publication date: 2013-07-24

Abstract

The invention discloses a method and a device for splitting a file. The device comprises a pre-processing module, a positioning module, a searching module and a splitting module, wherein the pre-processing module is used for pre-processing a directory structure of an original file, the positioning module is used for positioning a directory sequence and a directory gradation of the original file, the searching module is used for searching file contents needing clipping in a content file and used for positioning positions of a file directory at the beginning of contents, and the splitting module is used for clipping and cutting extracted contents of a sub-directory, pasting the clipped contents on a new empty file and storing the clipped contents in a database in a webpage mode. The device is fast in splitting and good in splitting effect and cannot cause the problems of disorder file layout and system crash.

Description

A kind of file method for splitting and device

Technical field

The present invention relates to the administrative skill of large data objects, belong to the Intelligent Information Processing field in the Computer Science and Technology subject.

Background technology

A gordian technique in the large data objects management is how the data file to be cut apart, so that carry out file management and intelligent search.File splitting method is employing order cutting techniques usually, but for the cutting apart of large data objects, and when reaching GB as the capacity of data, it is very low that it cuts apart efficient, even can cause the Installed System Memory collapse.This is owing to will be in internal memory the data file such as be opened, copies, pastes, preserves, uploads at operation, therefore a large amount of consumption the Installed System Memory space.

Summary of the invention

It is very low that the technical problem to be solved in the present invention is that existing file splitting method is cut apart efficient, even can cause the Installed System Memory collapse.

For solving the problems of the technologies described above, the technical solution used in the present invention is: a kind of file method for splitting may further comprise the steps: 1) document directory structure is carried out pre-service, make its standardization; 2) adopt two pointer techniques that file directory is positioned, obtain the catalogue number of data file; 3) afterbody from file begins to shear to the section start of article according to this, order according to file directory is carried out segmented extraction by catalogue, content to sub-directory is sheared, and then the content of shearing is pasted in the new empty file again, and is kept in the database with the form of webpage.

The dividing method of traditional file is an employing order dividing method, promptly carry out segmented extraction by catalogue according to the order of file directory, content to sub-directory is sheared, and then the content of shearing is pasted in the new empty file again, and is kept in the database with the form of webpage; The present invention is according to the bibliographic structure partition data file of file, with the base unit of sub-directory as the storage and management file, and file carried out shearing manipulation.Adopt the advantage of shearing manipulation to be: along with constantly carrying out of splitting, the shared memory headroom of original will gradually reduce, and fractionation speed is constantly to accelerate; In addition, the present invention takes down the method for ordering, begins to shear from the afterbody of file, so just can not cause moving of file content, when having avoided the employing sequential system to split, file layout confusion that may cause and system crash problem, and then obtain satisfied fractionation effect.

As a kind of improvement project of the present invention, step 2) in two pointers comprise pointer Count and pointer Catalog(); Pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file; Pointer Catalog() is array, for splitting the TOC level at catalogue place.

A kind of file detachment device comprises pretreatment module, is used for the bibliographic structure of pre-service source document, makes its standardization; Locating module is used to locate the directory order and the TOC level of source document; Search module, be used for, search the file content that needs shearing at described content file, and the position at the file directory place of locating content section start; Split module, be used to shear the content of the sub-directory of segmented extraction, and the content of shearing is pasted in the new empty file, preserve into database with the form of webpage.

Advantage of the present invention is: fractionation speed soon, the file layout confusion that can not cause and system crash problem, split effective.

Description of drawings

Fig. 1 is a schematic flow sheet of the present invention.

Embodiment

The inventive system comprises with lower module:

Pretreatment module is used for the bibliographic structure of pre-service source document, makes its standardization;

Locating module is used to locate the directory order and the TOC level of source document;

Search module, be used for, search the file content that needs shearing at described content file, and the position at the file directory place of locating content section start;

Split module, be used to copy the content of the sub-directory of segmented extraction, and the content of shearing is pasted in the new empty file, preserve into database with the form of webpage.

Method of the present invention may further comprise the steps:

1) document directory structure is carried out pre-service, make its standardization;

2) adopt two pointer techniques that file directory is positioned, obtain the catalogue number of data file; Described pair of pointer comprises pointer Count and pointer Catalog(); Pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file; Pointer Catalog() is array, for splitting the TOC level at catalogue place

3) afterbody from file begins to shear to the section start of article according to this, order according to file directory is carried out segmented extraction by catalogue, content to sub-directory is sheared, and then the content of shearing is pasted in the new empty file again, and is kept in the database with the form of webpage.

As shown in Figure 1, the present invention can directly split the word file with standard bibliographic structure, if the word document directory structure is lack of standardization, must carry out pre-service, after it is standardized, re-uses method of the present invention and splits.Sort method of the present invention adopts two pointer techniques that file directory is positioned, and pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file.Pointer Catalog() being array, is the TOC level that will split the catalogue place, as first class catalogue, second-level directory etc.Start is for splitting the reference position of content, and End is a final position.It is as follows that the present invention splits flow process:

Step 1: obtain the catalogue number of data file, and assignment is given pointer Count;

Step 2: split reference position Start and put initial value 0;

Step 3: split final position End and put initial value, point to the end of file;

Step 4: obtain the paragraph number of data file, the paragraph number is added in the lump give variable i with its assignment;

Step 5: the paragraph number is subtracted one, and give variable i with its assignment;

Step 6: judge whether the i section is the catalogue of file, if, change step 7, not to change step 5;

Step 7: obtain the reference position of i section, and assignment is given Start;

Step 8: the content between shearing from Start to End;

Step 9: the content of shearing is saved as webpage and be saved in the database;

Step 10: the catalogue number is subtracted one, and be kept among the pointer Count;

Step 11: Count is saved in the database;

Step 12: the level at this catalogue place is saved in array Catalog() in, and is saved in the database;

Step 13: variate-value End=Start is set;

Step 14: whether interpretation i is greater than 0, if commentaries on classics step 5 is not to change step 15;

Step 15: algorithm finishes.

Claims

1. a file method for splitting is characterized in that, may further comprise the steps:

2) adopt two pointer techniques that file directory is positioned, obtain the catalogue number of data file;

2. a kind of file method for splitting according to claim 1 is characterized in that: step 2) in two pointers comprise pointer Count and pointer Catalog(); Pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file; Pointer Catalog() is array, for splitting the TOC level at catalogue place.

3. adopt any employed device of described a kind of file method for splitting among the claim 1-2, comprising:

Split module, be used to shear the content of the sub-directory of segmented extraction, and the content of shearing is pasted in the new empty file, preserve into database with the form of webpage.