CN104410868B

CN104410868B - A kind of shared-file system multifile rapid polymerization and the method read

Info

Publication number: CN104410868B
Application number: CN201410600003.9A
Authority: CN
Inventors: 褚震宇; 徐荣波; 王付生
Original assignee: Beijing Dayang Technology Development Inc
Current assignee: Beijing Dayang Technology Development Inc
Priority date: 2014-10-31
Filing date: 2014-10-31
Publication date: 2017-11-17
Anticipated expiration: 2034-10-31
Also published as: CN104410868A

Abstract

The present invention relates to a kind of shared-file system multifile rapid polymerization and the method read, including：Multifile rapid polymerization；Open file；Obtain metadata；Read or change file；Close file." multifile rapid polymerization " includes：Detection polymerization subfile validity；Whether detection polymerization is additional polymerization；Create aggregate file；Aggregate file is established with polymerizeing subfile mapping relations；Processing polymerization subfile." the step of obtaining metadata " includes：Open up memory space；Selection is handled；Whether the file that detection obtains metadata is aggregate file；Aggregate file is obtained with polymerizeing subfile mapping relations；Obtain metadata.By file caused by multiple client, rapid polymerization is into one big file in a short time or quickly addition is aggregated to behind existing big file by multiple files by rapid polymerization method by the present invention, file copy work is not produced in the course of the polymerization process, the file of rapid polymerization is identical with using the file generated after physics polymerization.

Description

A kind of shared-file system multifile rapid polymerization and the method read

Technical field

The present invention relates to a kind of shared-file system multifile rapid polymerization and the method read, is that one kind is applied to share The PC clusters such as file system video and audio field distributed trans-coding, distributed packing produce the side that multiple files carry out rapid polymerization Method, it is a kind of method of the multifile rapid polymerization under shared-file system suitable for broadcasting and TV application field.

Background technology

At present, general shared-file system is by meta data server（MDS）, shared storage and multi-client pass through LAN Network and SAN network composition, meta data server and client can directly access shared deposit by FC or ISCSI agreements Storage, respectively FC-SAN and IP-SAN frameworks.For SAN network due to being optical fiber transmission, bandwidth, capacity are big, speed is fast, are commonly used to pass The huge file of transmission of data amount, such as：Video file.

Metadata in SAN shared-file systems is to describe the data structure of data organization method, metadata essential record Some association attributeses of division methods of this document on block device, deposit position and file in SAN shared-file systems Deng.SAN shared-file systems by metadata by continuous block device storage organization into file structure, size and the text of metadata Number of packages evidence compares very little, so transmission bandwidth that need not be very high, so being transmitted with lan network.In SAN shared-file systems Metadata information is managed collectively by meta data server.Client passes through tcp/ip agreements and Metadata Service under lan network Device is connected and communicated.

SAN file system is widely used in the post-production of broadcast TV program, including one of those it is important should Use scene：Multiple client can simultaneously in SAN shared-file systems file carry out accessing operation, such as trans-coding system, Packaging system.The system be in programming process must be through link, while be also the intensive application of computed altitude, especially face Current high definition is made, it is necessary to expend the substantial amounts of producing efficiency for calculating the time, influenceing program.Therefore, utilize distributed meter Distributed packing/transcoding technology of calculation arises at the historic moment, and the technology carries out PC cluster application using SAN shared-file systems, with Unit computing is compared and operation time is greatly shortened.But the current generally existing of the technology is a little insufficient, i.e., after PC cluster terminates, Each client can generate a new file in shared storage, at this moment need to read respectively in some client each File is local to client, then carries out physics polymerization to these files in local client, generates the big file of new polymerization, most The big file of polymerization generation is write in shared storage again afterwards, conducted interviews for other clients.This uses distributed computing technology The process of the extra physics polymerization brought needs to consume very long polymerization time, and the node for once carrying out PC cluster is non- Chang Duo, substantial amounts of file to be polymerized will be produced, it is also longer so as to carry out polymerizeing the spent time.Meanwhile if Use such a physics polymerization, it will take very long polymerization time so that whole SAN network operation is slack-off, for a long time Reduce the memory bandwidth that meta data server externally provides service.Moreover, physics polymerization needs to carry out each file Read operation, while newly-generated physics aggregate file is carried out once to arrive shared storage（Typically disk array）Write behaviour Make, the read-write operation of these files also largely occupies the I/O capability of disk array and the transmission bandwidth of SAN network.

The content of the invention

Overcome problem of the prior art, the purpose of the present invention is to propose to a kind of shared-file system multifile rapid polymerization and The method of reading.Described method is after PC cluster terminates, by specific polymerization by caused by multiple client Into one big file or by multiple files, quickly addition is aggregated to behind existing big file rapid polymerization file in a short time, Do not produce the copy work of file actual content in the course of the polymerization process, while for client, the file of this rapid polymerization and It is identical with operation during write-in in reading using the file generated after physics polymerization, substantially reduce file polymerization and account for Time.

The object of the present invention is achieved like this：

A kind of shared-file system multifile rapid polymerization method, the hardware system used in described method include： Multiple client is connected by transmitting the express network of large volume file with meta data server and disk array, and described is multiple Client is also connected by transmitting the network of metadata and control and interactive information with meta data server simultaneously, methods described Step is as follows：

The step of generating subfile：Multiple client performs same processing task not respectively according to default polymeric rule Same part, it is every to be partly generated as an independent subfile；

The step of multifile rapid polymerization：Multiple subfiles that shared-file system polymerize to needs are effectively checked And " logical aggregate " is rapidly performed by, multiple subfile rapid polymerizations or quick add are aggregated into a big file of polymerization.

Further, above-mentioned " the step of multifile rapid polymerization " includes following sub-step：

The sub-step of detection polymerization subfile validity：It is in need that file system detects institute when carrying out file rapid polymerization The subfile validity of polymerization and whether meet polymeric rule, multifile rapid polymerization step is exited if "No", if "Yes" then enters next sub-step；

Whether detection polymerization is the additional sub-step polymerizeing：File system detects polymerization when carrying out file rapid polymerization No is to carry out adding polymerization on the basis of original aggregate file, need to detect existing aggregate file if additional polymerization be It is no to meet additional polymeric rule, aggregate file is established with polymerizeing Ziwen if adding polymerization and meeting that additional polymeric rule enters The sub-step of part mapping relations, else if being additional polymerization but being unsatisfactory for adding polymeric rule and just exit multifile rapid polymerization Step；Then enter next sub-step if not additional polymerization；

Create the sub-step of aggregate file：Created according to the aggregate file information of client request and each subfile attribute information Aggregate file is built, and calculates the correlation attribute information of aggregate file；

Aggregate file is established with polymerizeing the sub-step of subfile mapping relations：The each subfile polymerizeing as needed is related Attribute information, the correlation attribute information of aggregate file is updated, establish the mapping relations of aggregate file and each subfile.

Further, the step of above-mentioned multifile rapid polymerization also includes the sub-step of processing polymerization subfile：Polymerization After the completion of, polymerization subfile is handled, client is no longer viewed polymerization subfile information.

Further, above-mentioned " the step of generation subfile " includes following sub-step：

The sub-step of processing task distribution：Same processing task is split at more height according to the requirement of polymeric rule Reason task, subtask is dispatched to different client executings；

Calculate the sub-step of subfile：Multiple client is to the subprocessing tasks carrying Distributed Calculation of distribution, and generation is respectively Self-corresponding subfile；If sub-file length is not the integral multiple of file system block size in the encapsulation process of subfile, Clear data is mended to insufficient position if necessary.

Further, the method that multifile rapid polymerization is realized in above-mentioned shared-file system, described polymeric rule bag The size of enclosed tool file is the integral multiple of file system blocks；

Further, the method that multifile rapid polymerization is realized in above-mentioned shared-file system, described calculating include compiling Synthesis is changed and/or rendered to code form；

Further, the method that multifile rapid polymerization is realized in above-mentioned shared-file system, it is described to generate respectively Mode is to be generated using the mode of Distributed Calculation；

Further, the method that multifile rapid polymerization is realized in above-mentioned shared-file system, if processing task is to turn Code task or packing task, the video compression coding of the file destination of generation select the coded system of cbr (constant bit rate)；

A kind of read method for stating shared-file system multifile rapid polymerization file, include the step of this method：

The step of opening file：Request is sent to meta data server, it is desirable to open in disk array and treat for client The file of reading；

The step of handling metadata：Client is obtained to meta data server application and treated according to the content of file to be read The metadata corresponding to file is read, client obtains respective meta-data, while client receives to distribute to the chance lock of oneself；

The step of reading file：Client obtains metadata section, and according to the metadata section obtained, disk array is sent out The block request of data for reading file is acted, to complete the metadata section corresponding blocks data read operation, the application metadata section of circulation With read metadata section corresponding to block number evidence, until complete needed for data reading；

The step of closing of a file：Client sends requirement to meta data server, closes the reading file handle opened, Complete the reading of this file.

Further, meta data server is treated in normally processing client request in above-mentioned " the step of processing metadata " Whether be aggregate file, including following sub-step if detection in addition is also needed to outside the metadata of operation file and obtains the file of metadata Suddenly：

Open up the sub-step of memory space：Opened up for client in local memory or hard disk for storing metadata Memory space；

Select the sub-step of processing：Backstage queue is placed on to metadata application request selecting for client to wait still Processing in real time, waited if backstage waits that then metadata application request is put into metadata request queue, if in real time Processing then enters next sub-step；

Detection obtain metadata file whether the sub-step of aggregate file：For meta data server detection client hair Whether the file for playing acquisition metadata request is aggregate file, and " sub-step for obtaining metadata " is entered if "No", if "Yes" then enters next sub-step；

Aggregate file is obtained with polymerizeing subfile mapping relations sub-step：According to first number of the aggregate file of client request It is believed that breath, obtains the mapping relations for corresponding polymerization subfile；

Obtain the sub-step of metadata：Client is returned to according to the metadata of the corresponding file of acquisition request.

The beneficial effect comprise that：Method of the present invention, when carrying out PC cluster, by specific fast Fast polymerization can by file caused by multiple client in a short time rapid polymerization into one big file or by multiple texts Quickly addition is aggregated to behind existing big file part, will not produce file copy work in the course of the polymerization process, while to client For end, the file of this rapid polymerization is identical with using the file after physics polymerization.The present invention is effectively shortened Shared-file system is applied to the polymerization time of multiple files during calculating, and aggregate file quantity is more, and effect is all the more obvious, For the transcoding in common video and audio processing or packing synthesis processing, distributed trans-coding/beat largely is improved The efficiency of bag.PC cluster speed is also improved simultaneously, reduces data storage bandwidth.Due to the application in broadcast television industry In, meeting is frequently packed/turned by further improving distribution using the operation of distributed trans-coding/distribution packing, the present invention The efficiency of code, the improved efficiency in TV programme integral manufacturing flow is brought, especially current HDTV program production is anticipated Justice is especially great.So that distributed trans-coding or packing synthesis are after the completion of each subtask, it is not necessary to wait very long file physics Union operation, but be directly almost moment and complete logic and merge, and the file after logic is merged submit the follow-up review of a film by the censor or Person, which broadcasts, to be used, and meets the demand of Modern Media mechanism very first time transmission information.

Brief description of the drawings

The invention will be further described with reference to the accompanying drawings and examples.

Fig. 1 is showing for hardware system used in shared-file system file rapid polymerization method described in inventive embodiments one It is intended to；

Fig. 2 is the flow chart of shared-file system file rapid polymerization method described in inventive embodiments one.

Fig. 3 is the flow chart of the reading aggregate file method described in inventive embodiments one.

Fig. 4 is the schematic flow sheet of the processing metadata step described in inventive embodiments one.

Embodiment

Embodiment one：

The present embodiment is a kind of shared-file system multifile rapid polymerization and the method for reading the aggregate file.It is described Method used in hardware system include：Multiple client（3 clients are only depicted in Fig. 1, can be had in practice more More clients）Pass through the SAN network of transmitting video files（Heavy line and two-wire represent in Fig. 1）With meta data server and magnetic Disk array connects, and described multiple client is by transmitting the lan network of metadata（Represented in Fig. 1 with fine line）With metadata Server connects, as shown in Figure 1.

Wherein client can be common PC work station or server, have the energy of connection SAN network Power, and large-scale file can be handled, such as high definition video file.SAN network described in the present embodiment is handed over by optical fiber The optical networking changed planes with optical cable composition, is the broadband network that bandwidth exceedes 1G, can also may be used with the video file of transmission of high-definition To be formed SAN network with thousand M or ten thousand M Fast Ethernet.Lan network described in the present embodiment is made up of simultaneously ethernet switch Using TCP/IP be communications protocol Ethernet, have hundreds of M bandwidth, can quickly transmit meta data file and control and Task interactive information.Under normal conditions in order to avoid meta data server goes wrong and influences the normal of whole SAN system Operation, can increase a standby meta data server, i.e. two meta data servers, it is backuped each other, in synchrodata On the basis of realize metadata server redundancy.Shared storage device in system leads to usually using disk array, disk array Cross SAN network with client with meta data server to be connected, client and meta data server can pass through fc agreements or iscsi Agreement conducts interviews to it.

The basic ideas of the present embodiment are：When carrying out Distributed Calculation, multiple client is according to default polymeric rule Multiple files are generated, are afterwards quickly gathered file caused by multiple client in a short time by specific rapid polymerization method The quick addition of multiple files is aggregated to behind existing big file by the one big file of synthesis, does not produce text in the course of the polymerization process Part copies work, while for client, file and use the text generated after physics polymerization that this rapid polymerization generates Part is identical.Big data quantity content particularly suitable for distributed trans-coding, packing is generated and merged into by Distributed Calculation After one file, for the situation of multi-client share and access.The present embodiment methods described can be expressed as：Multiple client server After PC cluster is carried out, multiple files can be generated in storage sharing, respectively file1, file2, file3 ..., fileN, Size is respectively M1, M2, M3 ..., MN, by SAN file system using specific polymerization by multiple file f ile1, File2, file3 ..., rapid polymerization is big into a new big file, entitled mergefile, file in a short time by fileN The small size sum for being M1+M2+M3+ ...+MN, being the multiple files being polymerize, meanwhile, the file of rapid polymerization generation and use The file generated after physics polymerization includes file size and other attributes are identical.So, it is greatly lowered polymerization The time that multiple files use, while also reduce the bandwidth utilization rate of storage.Multiple files use thing in common SAN system , it is necessary to read file f ile1, file2, file3 ... respectively in a client in reason polymerization process, fileN is to objective Family end, physics polymerization then is carried out to file1, file2, file3 ..., fileN in local, generates file Mergefile, finally the file mergefile of polymerization generation is write in shared storage again, conducted interviews for other clients. And the present embodiment then carries out " logical aggregate " in shared storage to multiple file f ile1, file2, file3 ..., fileN, During " logical aggregate ", All Files is present in shared storage all the time, file will not be copied and read work.With This can be effectively shortened the polymerization time that SAN shared systems are applied to multiple files during PC cluster, and aggregate file number Amount is more, and advantage is all the more obvious.PC cluster speed is also improved simultaneously, reduces in polymerization process and data of magnetic disk array is deposited The occupancy of bandwidth is stored up, while reduces the occupancy to the transmission bandwidth of SAN network so that same disk array and SAN network It can support more clients are shared to use.It is crucial that copied because actual file does not occur for logical aggregate process It shellfish, can be completed in moment, very valuable time is saved so as to quickly audit broadcast for TV programme, in high-definition program text In the case of part bulky, the saving meaning of this backstage technical finesse holding time is especially great.

Realize that the detailed process step of multifile rapid polymerization method is as follows in shared-file system described in the present embodiment：

1st, the step of multiple subfile generations：Same processing task is torn open according to the requirement of polymeric rule for client It is divided into multiple subprocessing tasks, subtask is dispatched to different client executings；Multiple client is appointed to the subprocessing of distribution Business performs Distributed Calculation, generates each self-corresponding subfile；If sub-file length is not in the encapsulation process of subfile The integral multiple of file system block size, clear data is mended to insufficient position if necessary.Multiple subfiles of generation are write by SAN network Enter onto shared storage.Include in specific implementation：

（1）, for client same processing task is split into multiple subprocessing tasks according to the requirement of polymeric rule, Subtask is dispatched to different client executings；Next big calculating task of normal conditions has administration authority by one Client is assigned into multiple subtasks, is given the parallel computation simultaneously of more clients and is completed, wherein the process of distribution task, is just needed Calculating task is split previously according to the requirement of office and rule, so that each client for performing sub- calculating task is most lifelong Into subfile meet rapid polymerization requirement.For example, transcoding, pipe are carried out for the program that a time span is 2 hours Client is managed according to current idle client terminal quantity, the different piece of the program of 2 hours is dispatched to different clients Transcoding calculating is performed simultaneously, respectively subfile of the generation corresponding to different piece.

The fractionation of task should meet that the workload that each client executing calculates is roughly the same, so as in the most short time It is interior while complete transcoding task, give next working link and use；When meeting rapid polymerization again, except last Ziwen Outside part, the size of remaining subfile must be the integral multiple of file system block size.Under normal circumstances, for the target of transcoding The Video coding of file is the compressed encoding form of cbr (constant bit rate), is easier to determine the file size after each section of transcoding, from And split task and relatively easily realize.For some video and audio Document encapsulation forms, it is allowed in every frame video and audio of reality The filling data of blank are added at data end, so as to reach the requirement for the integral multiple that subfile size is file system block size. So for transcoding task or packing synthesis task, if it is desired to finally polymerize the effect for improving Piece file mergence using rapid file Rate, compressed encoding form that can be using the Video coding of selection target file as cbr (constant bit rate), the encapsulation format of video and audio file are AVI or MXF OP1A..For the Document encapsulation form not in addition blank filling data among video/audio, need Management client is wanted strictly to calculate the file size at each frame data end, finding can meet that subfile size is file just The in and out point of the integral multiple of system block size, other client executings are given so as to form sub- calculating task.

For have N platforms can be with the client of subtasking in the case of, calculating task can be both divided into N parts, it is each The one of calculating task of platform client executing, generate N number of subfile and quickly merged, such case is typically all clients When calculating task is all completed at end, from management client to file system application documents rapid polymerization；It can also generate far more than N's M subtask, subtasking, management client monitor the generation situation of subfile to each client at any time successively from front to back, And submit rapid file aggregate request to file system at any time.In this case it is often the additional polymerization of subfile, works as whole When subtask is carried out completing, final rapid polymerization file can just submit to next link and use.

（2）, calculate subfile sub-step：Multiple client performs Distributed Calculation to the content of point good section, encodes lattice Formula uses the video compression coding mode of cbr (constant bit rate), generate in file processes and given birth to according to the integral multiple of file system block size Into file, clear data is mended in insufficient position.In order that the subfile that must be generated meets default size, it is necessary to during coding It is determined that the size of generation data, the file if just meeting generation directly seals if being the requirement of file system block size integral multiple Dress up file；If be unsatisfactory for, clear data is supplemented behind the data to meet.Due to having been examined when calculating task is split The requirement of rapid polymerization rule is considered, client only needs subtasking and supplements clear data when being necessary Generation meets the subfile of needs.

2. the step of preparing multifile rapid polymerization：For extracting the information of each subfile for preparing polymerization.

3rd, the step of multifile rapid polymerization：The multiple subfiles polymerizeing for file system to needs are effectively checked And " logical aggregate " is rapidly performed by, multiple polymerization subfile rapid polymerizations or quick add are aggregated into the big text of a polymerization Part.

This step and the difference of traditional physics polymerization are：Traditional physics polymerization is to pass through copy mode Complete, read first file for needing to polymerize first, be then successively read alternative document according to polymerization sequence, until all File all polymerize completion.And this step will not then use copy mode, pass through the mapping established aggregate file with polymerize subfile Relation is carried out quickly " logical aggregate ".Therefore, the multifile rapid polymerization step described in this step includes following sub-step：

（1）The sub-step of detection polymerization subfile validity：Detected for file system when carrying out file rapid polymerization The subfile validity of polymerization in need and whether meet polymeric rule, multifile rapid polymerization step is exited if "No" Suddenly, next sub-step is entered if "Yes".

This sub-step is judge the step of, judge the subfile in need being polymerize whether effectively and whether Meet polymeric rule.For the subfile of polymerization in need must be fulfilled for file system some rules, that is, except last Individual polymerization subfile, the size of other all polymerization subfiles all must be the integral multiple of file system block size, and ensure Without these subfiles of other client operations, it so just can guarantee that the file after polymerization can correctly be accessed by client.By In using " logical aggregate ", in polymerization process, the work such as position change and the copy of file will not be carried out, this requires to remove Last polymerization subfile, sizes of other polymerization subfiles must all be fulfilled for be file system block size integral multiple, Otherwise " cavity " occurs in the file polymerizeing, and causes file that problem occurs in reading process.Therefore, this step is mainly used in text Part system detected when carrying out file rapid polymerization polymerization in need subfile whether effectively and whether meet polymerization rule Then, if rapid file polymerization procedure will be exited by being unsatisfactory for polymeric rule.

（2）Whether detection polymerization is the additional sub-step polymerizeing：Examined for file system when carrying out file rapid polymerization Survey whether polymerization is to carry out adding polymerization on the basis of original aggregate file, need to detect existing gather if additional polymerization Close whether file meets to add polymeric rule, if additional polymerize and meet that additional polymeric rule just updates the phase of aggregate file Attribute information is closed, into the sub-step established aggregate file with polymerize subfile mapping relations, multifile is otherwise exited and quickly gathers Close；Then enter next sub-step if not additional polymerization.

In additional polymerization process, it is necessary to ensure that the size of last polymerization subfile in aggregate file be present Must be fulfilled for be file system block size integral multiple, otherwise the file after additional polymerization " cavity " occurs, cause file to exist Problem occurs in reading process.This sub-step is judge the step of, judges whether polymerize is additional polymerization, if It polymerize to be additional, then the already present big file of polymerization needs to meet some rules of SAN file system, that is, existing polymerization text Last polymerization subfile size in part must be the integral multiple of file system block size, so just can guarantee that other polymerizations Subfile is added on the big file of existing polymerization being aggregated to.Meet conditions above, it is possible to correctly to institute's polymerization in need File polymerize or additional polymerization, otherwise will exit multifile rapid polymerization.

（3）Create the sub-step of aggregate file：Believed according to the aggregate file information of client request and each subfile attribute Breath creates aggregate file, and calculates the correlation attribute information of aggregate file.Here attribute information is primarily referred to as aggregate file Document size information, it can be obtained by calculating each subfile size sum.

（4）Aggregate file is established with polymerizeing the sub-step of subfile mapping relations：The each subfile polymerizeing as needed Correlation attribute information, establish the mapping relations of aggregate file and each subfile.

This sub-step is the committed step of multifile rapid polymerization, and whether aggregate file can be correctly accessed depending on the son Step.Because " logical aggregate " mode used aggregate file, actually in file system, each subfile is scattered It is self-existent, all it is not related with other subfiles and aggregate file.So when accessing aggregate file, it must just lead to Certain mode is crossed opening relationships between aggregate file and all subfiles, could normally access whole aggregate file.This In establish mapping relations by way of index for aggregate file and subfile, preserve aggregate file first in indexed file Information, include the number of aggregate file title, size and aggregate file, then preserve the title letter of each subfile successively Breath, the size of offset and the subfile of the subfile in aggregate file.So, when accessing aggregate file, pass through Accessed aggregate file offset can quickly be accurately positioned corresponding subfile, it is then corresponding by reading subfile Information completes the access to aggregate file.

（5）The sub-step of processing polymerization subfile：After the completion of polymerization, polymerization subfile is handled, makes client not Polymerization subfile information is viewed again.After the completion of polymerization, special marking processing is carried out to the subfile polymerizeing, that is, in text The meta data server end of part system hides the subfile being aggregated so that client can not be to subfile that these polymerize Directly accessed.

The present embodiment also includes the method for reading the aggregate file, and this method comprises the following steps：

1st, the step of opening file：Request is sent to meta data server, it is desirable to open in disk array for client File to be read.This step is basic step, and when user needs to read a file, user opens in client and treated The handle of operation file, client send associative operation request according to the handle of this document to meta data server.

2nd, the step of metadata is handled：Client obtains according to the content of file to be read to meta data server application Metadata corresponding to file to be operated, client obtains respective meta-data information, while client receives to distribute to oneself Chance lock.

The difference of this step and the acquisition metadata of traditional SAN shared files is：Traditional SAN file system obtains member Data method metadata information according to corresponding to the content of client request file directly obtains this document content, and this step is then Also need to judge to obtain whether file to be operated is aggregate file in addition before the metadata of file to be operated is normally obtained. Therefore, the client process metadata described in this step includes following sub-step：

（1）Open up the sub-step of memory space：Opened up for client in local memory or hard disk for storing first number According to memory space.The situation that ordinary circumstance double base data server all breaks down be not it is a lot, as needed can be hard Disk opens up memory space.

（2）Select the sub-step of processing：Backstage queue is placed on to metadata application request selecting for client to wait Or processing in real time, waited if backstage waits that then metadata application request is put into metadata request queue, if Processing then enters next sub-step in real time.Handle file read-write, can be carried out simultaneously in client it is multiple, in order to improve efficiency Multiple file read-write threads can be opened up, thus have the operation queue of multiple threads, it is necessary to which it is can from queue to handle Middle taking-up, is handled.If running background, will below the step of, adds request queue, transfers to the processing line being previously created Journey processing；If not running background, directly handled in this thread.

（3）Detection obtain metadata file whether the sub-step of aggregate file：Client is detected for meta data server Whether the file that end initiates to obtain metadata request is aggregate file, and " sub-step for obtaining metadata " is entered if "No", Enter next sub-step if "Yes".This sub-step is detect and judge the step of, judges client request metadata File whether be aggregate file, if the metadata information for aggregate file of request, need according to aggregate file and poly- The corresponding subfile metadata information of zygote File Mapping Relation acquisition, next sub-step can be entered in this case：If Qing The metadata information for non-polymeric file asked, then respective meta-data information is directly obtained according to request, enter to obtain metadata Sub-step.

（4）Aggregate file is obtained with polymerizeing subfile mapping relations sub-step：According to the aggregate file of client request Metadata, obtain the mapping relations for corresponding polymerization subfile.

（5）Obtain the sub-step of metadata：Corresponding operating file is obtained according to communication rule for meta data server Metadata, and metadata information is returned to client.An aggregate file metadata is obtained with obtaining first number of ordinary file It is the same according to process, simply needs first to judge the Ziwen corresponding to the metadata information when obtaining aggregate file metadata Part, then obtain the corresponding metadata information of the subfile and return to client, so circulation is until by first number of required data Finished according to acquisition of information.To this sub-step, whole " the step of processing metadata " terminates.

3rd, the step of reading file：Metadata section is obtained for client, and according to the metadata section obtained, to disk Array initiates to read the block request of data of file, to complete the metadata section corresponding blocks data read operation, the application member of circulation Block number evidence corresponding to data segment and reading metadata section, until the reading of data needed for completing.

4th, the step of closing file：Requirement is sent to meta data server for client, closes the operation text opened Part handle, complete the reading of this file.

Finally it should be noted that being merely illustrative of the technical solution of the present invention and unrestricted above, although with reference to preferable cloth Scheme is put the present invention is described in detail, it will be understood by those within the art that, can be to the technology of the present invention Scheme（For example obtain the sequencing of the mode of metadata, the reading manner of file, step etc.）Modify or equally replace Change, without departing from the spirit and scope of technical solution of the present invention.Method of the present invention can be worked out as applied to department of computer science The program of system, and run in computer network system of the present invention.

Claims

1. a kind of shared-file system multifile rapid polymerization method, the hardware system used in described method includes：It is more Individual client is connected by transmitting the express network of large volume file with meta data server and disk array, described multiple visitors Family end is also connected by transmitting the network of metadata and control and interactive information with meta data server simultaneously, the step of methods described It is rapid as follows：

The step of generating subfile：Multiple client performs the different portions of same processing task according to default polymeric rule respectively Point, it is every to be partly generated as an independent subfile；

The step of multifile rapid polymerization：Shared-file system to multiple subfiles for polymerizeing of needs effectively check and quick Progress " logical aggregate ", multiple subfile rapid polymerizations or quick addition are aggregated into an aggregate file, described " logic Polymerization " refers to：In the course of the polymerization process, All Files is present in shared storage all the time, is not copied and read work to file Make；

Characterized in that,

Described " the step of multifile rapid polymerization ", includes following sub-step：

The sub-step of detection polymerization subfile validity：File system detects institute's polymerization in need when carrying out file rapid polymerization Subfile validity and whether meet polymeric rule, multifile rapid polymerization step is exited if "No", if "Yes" Then enter next sub-step；

Whether detection polymerization is the additional sub-step polymerizeing：File system detected when carrying out file rapid polymerization polymerization whether be Carry out adding polymerization on the basis of original aggregate file, whether completely to need to detect existing aggregate file if additional polymerization The additional polymeric rule of foot, if additional polymerization and meeting that additional polymeric rule enters and establishing aggregate file and reflected with polymerizeing subfile The sub-step of relation is penetrated, else if being additional polymerization but being unsatisfactory for adding polymeric rule and just exit multifile rapid polymerization step Suddenly；Then enter next sub-step if not additional polymerization；

Create the sub-step of aggregate file：Gathered according to the aggregate file information of client request and each subfile attribute information creating File is closed, and calculates the correlation attribute information of aggregate file；

Aggregate file is established with polymerizeing the sub-step of subfile mapping relations：The each subfile association attributes polymerizeing as needed Information, the correlation attribute information of aggregate file is updated, establish the mapping relations of aggregate file and each subfile.

2. shared-file system multifile rapid polymerization method as claimed in claim 1, it is characterised in that described multifile The step of rapid polymerization, also includes the sub-step of processing polymerization subfile：After the completion of polymerization, polymerization subfile is handled, made Client no longer views polymerization subfile information.

3. shared-file system multifile rapid polymerization method as claimed in claim 1, it is characterised in that described " generation The step of subfile ", includes following sub-step：

The sub-step of processing task distribution：Same processing task is split into multiple subprocessings according to the requirement of polymeric rule to appoint Business, subtask is dispatched to different client executings；

Calculate the sub-step of subfile：For multiple client to the subprocessing tasks carrying Distributed Calculation of distribution, it is each right to generate The subfile answered；It is necessary if sub-file length is not the integral multiple of file system block size in the encapsulation process of subfile When to insufficient position mend clear data.

4. shared-file system multifile rapid polymerization method as claimed in claim 1, it is characterised in that described polymerization rule Then the size including subfile is the integral multiple of file system blocks.

5. shared-file system multifile rapid polymerization method as claimed in claim 1, it is characterised in that described calculating bag Include coded format conversion and, or render synthesis.

6. shared-file system multifile rapid polymerization method as claimed in claim 1, it is characterised in that described gives birth to respectively Into mode be to be generated using the mode of Distributed Calculation.

7. shared-file system multifile rapid polymerization method as claimed in claim 1, it is characterised in that if processing task It is transcoding task or packing task, the video compression coding of the file destination of generation selects the coded system of cbr (constant bit rate).

Include 8. a kind of read method of aggregate file as claimed in claim 1, the step of methods described：

The step of opening file：Request is sent to meta data server, it is desirable to open to be read in disk array for client File；

The step of handling metadata：Client obtains to be read according to the content of file to be read to meta data server application Metadata corresponding to file, client obtains respective meta-data, while client receives to distribute to the chance lock of oneself；

The step of reading file：Client obtains metadata section, and according to the metadata section obtained, disk array is initiated to read The block request of data of file is taken, to complete the metadata section corresponding blocks data read operation, the application metadata section of circulation and reading Block number evidence corresponding to metadata section is taken, until the reading of data needed for completing；

The step of closing of a file：Client sends requirement to meta data server, closes the reading file handle opened, and completes This file is read；

Characterized in that, meta data server is waited to grasp in normally processing client request in described " the step of processing metadata " Whether be aggregate file, including following sub-step if making to also need to detection in addition outside the metadata of file to obtain the file of metadata：

Open up the sub-step of memory space：The storage for storing metadata is opened up in local memory or hard disk for client Space；

Select the sub-step of processing：Backstage queue is placed on to metadata application request selecting for client to wait or real-time Processing, waited if backstage waits that then metadata application request is put into metadata request queue, if processing in real time Then enter next sub-step；

Detection obtain metadata file whether the sub-step of aggregate file：Initiate to obtain for meta data server detection client Whether the file for taking metadata request is aggregate file, " sub-step for obtaining metadata " is entered if "No", if "Yes" Then enter next sub-step；

Aggregate file is obtained with polymerizeing subfile mapping relations sub-step：Believed according to the metadata of the aggregate file of client request Breath, obtain the mapping relations for corresponding polymerization subfile；