[go: up one dir, main page]

CN103218453A - Method and device for splitting file - Google Patents

Method and device for splitting file Download PDF

Info

Publication number
CN103218453A
CN103218453A CN2013101549863A CN201310154986A CN103218453A CN 103218453 A CN103218453 A CN 103218453A CN 2013101549863 A CN2013101549863 A CN 2013101549863A CN 201310154986 A CN201310154986 A CN 201310154986A CN 103218453 A CN103218453 A CN 103218453A
Authority
CN
China
Prior art keywords
file
directory
content
splitting
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101549863A
Other languages
Chinese (zh)
Inventor
王卫东
陈勇
叶华
李红梅
郭小芳
胡存刚
宋晓宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING LONGYUAN MICROELECTRONIC TECHNOLOGY Co Ltd
Original Assignee
NANJING LONGYUAN MICROELECTRONIC TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING LONGYUAN MICROELECTRONIC TECHNOLOGY Co Ltd filed Critical NANJING LONGYUAN MICROELECTRONIC TECHNOLOGY Co Ltd
Priority to CN2013101549863A priority Critical patent/CN103218453A/en
Publication of CN103218453A publication Critical patent/CN103218453A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for splitting a file. The device comprises a pre-processing module, a positioning module, a searching module and a splitting module, wherein the pre-processing module is used for pre-processing a directory structure of an original file, the positioning module is used for positioning a directory sequence and a directory gradation of the original file, the searching module is used for searching file contents needing clipping in a content file and used for positioning positions of a file directory at the beginning of contents, and the splitting module is used for clipping and cutting extracted contents of a sub-directory, pasting the clipped contents on a new empty file and storing the clipped contents in a database in a webpage mode. The device is fast in splitting and good in splitting effect and cannot cause the problems of disorder file layout and system crash.

Description

A kind of file method for splitting and device
Technical field
The present invention relates to the administrative skill of large data objects, belong to the Intelligent Information Processing field in the Computer Science and Technology subject.
Background technology
A gordian technique in the large data objects management is how the data file to be cut apart, so that carry out file management and intelligent search.File splitting method is employing order cutting techniques usually, but for the cutting apart of large data objects, and when reaching GB as the capacity of data, it is very low that it cuts apart efficient, even can cause the Installed System Memory collapse.This is owing to will be in internal memory the data file such as be opened, copies, pastes, preserves, uploads at operation, therefore a large amount of consumption the Installed System Memory space.
Summary of the invention
It is very low that the technical problem to be solved in the present invention is that existing file splitting method is cut apart efficient, even can cause the Installed System Memory collapse.
For solving the problems of the technologies described above, the technical solution used in the present invention is: a kind of file method for splitting may further comprise the steps: 1) document directory structure is carried out pre-service, make its standardization; 2) adopt two pointer techniques that file directory is positioned, obtain the catalogue number of data file; 3) afterbody from file begins to shear to the section start of article according to this, order according to file directory is carried out segmented extraction by catalogue, content to sub-directory is sheared, and then the content of shearing is pasted in the new empty file again, and is kept in the database with the form of webpage.
The dividing method of traditional file is an employing order dividing method, promptly carry out segmented extraction by catalogue according to the order of file directory, content to sub-directory is sheared, and then the content of shearing is pasted in the new empty file again, and is kept in the database with the form of webpage; The present invention is according to the bibliographic structure partition data file of file, with the base unit of sub-directory as the storage and management file, and file carried out shearing manipulation.Adopt the advantage of shearing manipulation to be: along with constantly carrying out of splitting, the shared memory headroom of original will gradually reduce, and fractionation speed is constantly to accelerate; In addition, the present invention takes down the method for ordering, begins to shear from the afterbody of file, so just can not cause moving of file content, when having avoided the employing sequential system to split, file layout confusion that may cause and system crash problem, and then obtain satisfied fractionation effect.
As a kind of improvement project of the present invention, step 2) in two pointers comprise pointer Count and pointer Catalog(); Pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file; Pointer Catalog() is array, for splitting the TOC level at catalogue place.
A kind of file detachment device comprises pretreatment module, is used for the bibliographic structure of pre-service source document, makes its standardization; Locating module is used to locate the directory order and the TOC level of source document; Search module, be used for, search the file content that needs shearing at described content file, and the position at the file directory place of locating content section start; Split module, be used to shear the content of the sub-directory of segmented extraction, and the content of shearing is pasted in the new empty file, preserve into database with the form of webpage.
Advantage of the present invention is: fractionation speed soon, the file layout confusion that can not cause and system crash problem, split effective.
Description of drawings
Fig. 1 is a schematic flow sheet of the present invention.
Embodiment
The inventive system comprises with lower module:
Pretreatment module is used for the bibliographic structure of pre-service source document, makes its standardization;
Locating module is used to locate the directory order and the TOC level of source document;
Search module, be used for, search the file content that needs shearing at described content file, and the position at the file directory place of locating content section start;
Split module, be used to copy the content of the sub-directory of segmented extraction, and the content of shearing is pasted in the new empty file, preserve into database with the form of webpage.
Method of the present invention may further comprise the steps:
1) document directory structure is carried out pre-service, make its standardization;
2) adopt two pointer techniques that file directory is positioned, obtain the catalogue number of data file; Described pair of pointer comprises pointer Count and pointer Catalog(); Pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file; Pointer Catalog() is array, for splitting the TOC level at catalogue place
3) afterbody from file begins to shear to the section start of article according to this, order according to file directory is carried out segmented extraction by catalogue, content to sub-directory is sheared, and then the content of shearing is pasted in the new empty file again, and is kept in the database with the form of webpage.
As shown in Figure 1, the present invention can directly split the word file with standard bibliographic structure, if the word document directory structure is lack of standardization, must carry out pre-service, after it is standardized, re-uses method of the present invention and splits.Sort method of the present invention adopts two pointer techniques that file directory is positioned, and pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file.Pointer Catalog() being array, is the TOC level that will split the catalogue place, as first class catalogue, second-level directory etc.Start is for splitting the reference position of content, and End is a final position.It is as follows that the present invention splits flow process:
Step 1: obtain the catalogue number of data file, and assignment is given pointer Count;
Step 2: split reference position Start and put initial value 0;
Step 3: split final position End and put initial value, point to the end of file;
Step 4: obtain the paragraph number of data file, the paragraph number is added in the lump give variable i with its assignment;
Step 5: the paragraph number is subtracted one, and give variable i with its assignment;
Step 6: judge whether the i section is the catalogue of file, if, change step 7, not to change step 5;
Step 7: obtain the reference position of i section, and assignment is given Start;
Step 8: the content between shearing from Start to End;
Step 9: the content of shearing is saved as webpage and be saved in the database;
Step 10: the catalogue number is subtracted one, and be kept among the pointer Count;
Step 11: Count is saved in the database;
Step 12: the level at this catalogue place is saved in array Catalog() in, and is saved in the database;
Step 13: variate-value End=Start is set;
Step 14: whether interpretation i is greater than 0, if commentaries on classics step 5 is not to change step 15;
Step 15: algorithm finishes.

Claims (3)

1. a file method for splitting is characterized in that, may further comprise the steps:
1) document directory structure is carried out pre-service, make its standardization;
2) adopt two pointer techniques that file directory is positioned, obtain the catalogue number of data file;
3) afterbody from file begins to shear to the section start of article according to this, order according to file directory is carried out segmented extraction by catalogue, content to sub-directory is sheared, and then the content of shearing is pasted in the new empty file again, and is kept in the database with the form of webpage.
2. a kind of file method for splitting according to claim 1 is characterized in that: step 2) in two pointers comprise pointer Count and pointer Catalog(); Pointer Count is the directory order of file, and its initial value is the maximum catalogue number of file; Pointer Catalog() is array, for splitting the TOC level at catalogue place.
3. adopt any employed device of described a kind of file method for splitting among the claim 1-2, comprising:
Pretreatment module is used for the bibliographic structure of pre-service source document, makes its standardization;
Locating module is used to locate the directory order and the TOC level of source document;
Search module, be used for, search the file content that needs shearing at described content file, and the position at the file directory place of locating content section start;
Split module, be used to shear the content of the sub-directory of segmented extraction, and the content of shearing is pasted in the new empty file, preserve into database with the form of webpage.
CN2013101549863A 2013-04-28 2013-04-28 Method and device for splitting file Pending CN103218453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013101549863A CN103218453A (en) 2013-04-28 2013-04-28 Method and device for splitting file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013101549863A CN103218453A (en) 2013-04-28 2013-04-28 Method and device for splitting file

Publications (1)

Publication Number Publication Date
CN103218453A true CN103218453A (en) 2013-07-24

Family

ID=48816240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013101549863A Pending CN103218453A (en) 2013-04-28 2013-04-28 Method and device for splitting file

Country Status (1)

Country Link
CN (1) CN103218453A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597422A (en) * 2020-12-30 2021-04-02 深圳市世强元件网络有限公司 PDF file segmentation method and PDF file loading method in webpage
CN112651988A (en) * 2021-01-13 2021-04-13 重庆大学 Finger-shaped image segmentation, finger-shaped plate dislocation and fastener abnormality detection method based on double-pointer positioning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2326805A1 (en) * 2000-11-24 2002-05-24 Ibm Canada Limited-Ibm Canada Limitee Method and apparatus for deleting data in a database
CN1853166A (en) * 2003-09-30 2006-10-25 英特尔公司 Method and apparatus for thread management of multithreading
CN101128820A (en) * 2004-12-30 2008-02-20 谷歌公司 Document Segmentation Based on Visual Gap
CN101692239A (en) * 2009-10-19 2010-04-07 浙江大学 Method for distributing metadata of distributed type file system
CN102143215A (en) * 2011-01-20 2011-08-03 中国人民解放军理工大学 Network-based PB level cloud storage system and processing method thereof
CN102262658A (en) * 2011-07-13 2011-11-30 东北大学 Method for extracting web data from bottom to top based on entity
CN102819599A (en) * 2012-08-15 2012-12-12 华数传媒网络有限公司 Method for constructing hierarchical catalogue based on consistent hashing data distribution

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2326805A1 (en) * 2000-11-24 2002-05-24 Ibm Canada Limited-Ibm Canada Limitee Method and apparatus for deleting data in a database
CN1853166A (en) * 2003-09-30 2006-10-25 英特尔公司 Method and apparatus for thread management of multithreading
CN101128820A (en) * 2004-12-30 2008-02-20 谷歌公司 Document Segmentation Based on Visual Gap
CN101692239A (en) * 2009-10-19 2010-04-07 浙江大学 Method for distributing metadata of distributed type file system
CN102143215A (en) * 2011-01-20 2011-08-03 中国人民解放军理工大学 Network-based PB level cloud storage system and processing method thereof
CN102262658A (en) * 2011-07-13 2011-11-30 东北大学 Method for extracting web data from bottom to top based on entity
CN102819599A (en) * 2012-08-15 2012-12-12 华数传媒网络有限公司 Method for constructing hierarchical catalogue based on consistent hashing data distribution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
穆飞等: "基于定位目录的元数据管理方法", 《清华大学学报》, 15 August 2009 (2009-08-15) *
高良才等: "一种基于聚类技术的图书目录识别方法", 《北京大学学报》, 20 July 2010 (2010-07-20) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597422A (en) * 2020-12-30 2021-04-02 深圳市世强元件网络有限公司 PDF file segmentation method and PDF file loading method in webpage
CN112651988A (en) * 2021-01-13 2021-04-13 重庆大学 Finger-shaped image segmentation, finger-shaped plate dislocation and fastener abnormality detection method based on double-pointer positioning

Similar Documents

Publication Publication Date Title
CN103605758B (en) The method and device that a kind of mobile terminal document is searched
US9846702B2 (en) Indexing of file in a hadoop cluster
CN102169507A (en) Distributed real-time search engine
US11003845B2 (en) Systems and methods for reduced memory usage when processing spreadsheet files
CN106874481B (en) Method and system for reading metadata information of distributed file system
US9842158B2 (en) Clustering web pages on a search engine results page
CN107844493B (en) File association method and system
CN103778202A (en) Enterprise electronic document managing server side and system
CN102930060A (en) Method and device for performing fast indexing of database
CN102411617A (en) Method for storing and inquiring mass URLs
US8818971B1 (en) Processing bulk deletions in distributed databases
CN104778182A (en) Data import method and system based on HBase (Hadoop Database)
US11210134B2 (en) Atomic execution unit for object storage
CN107066503A (en) The method and device of magnanimity metadata burst distribution
CN102708148A (en) Duplication eliminating method based on multidimensional lattice data spatial model
CN104462349A (en) File processing method and file processing device
CN103218453A (en) Method and device for splitting file
US20130304871A1 (en) Continually Updating a Channel of Aggregated and Curated Media Content Using Metadata
CN102799661A (en) Method and system for implementing semantic retrieval on electronic files
CN102831181B (en) Directory refreshing method for cache files
CN102521383A (en) Method for storing and accessing mass files in distributed system
CN104252537A (en) Index fragmentation method based on mail characteristics
US8700583B1 (en) Dynamic tiermaps for large online databases
CN103853832A (en) Customizable data capturing method in full-text retrieval system
US20130297576A1 (en) Efficient in-place preservation of content across content sources

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130724

RJ01 Rejection of invention patent application after publication