KR20060073724A

KR20060073724A - Method and device for storing and downloading duplicate files by using file information

Info

Publication number: KR20060073724A
Application number: KR1020040112109A
Authority: KR
Inventors: 서영준; 노재영
Original assignee: 주식회사 나우콤
Priority date: 2004-12-24
Filing date: 2004-12-24
Publication date: 2006-06-29
Anticipated expiration: 2024-12-24
Also published as: KR100700200B1

Abstract

파일의 정보를 이용하여 중복된 파일을 저장하고 다운로드하는 방법 및 장치에 관한 것이다.A method and apparatus for storing and downloading duplicate files using information of a file.

본 발명의 일 실시예에 따른 파일의 정보를 이용하여 중복된 파일을 저장하는 방법은 클라이언트로부터 제 1 파일의 적어도 일부에 대한 체크데이터를 수신하는 단계, 상기 수신한 체크데이터와 동일한 체크데이터를 가지는 제 2 파일을 검색하는 단계, 상기 제 2 파일이 존재하는 경우, 상기 클라이언트로부터 상기 제 1 파일을 상기 제 2 파일과 비교하여 동일한 부분과 상이한 부분을 추출하는 단계 및 상기 제 1 파일의 상기 동일한 부분은 상기 제 2 파일을 참조하고, 상기 제 1 파일의 상기 상이한 부분만 저장하는 단계를 포함한다.According to an embodiment of the present invention, a method of storing a duplicate file by using information of a file includes receiving check data for at least a portion of a first file from a client, and having the same check data as the received check data. Retrieving a second file, if the second file exists, comparing the first file with the second file from the client to extract a same portion and a different portion and the same portion of the first file Refers to the second file and includes storing only the different portion of the first file.

중복 파일, 파일 서버, CRC, 세그먼트Duplicate File, File Server, CRC, Segment

Description

Method and apparatus for storing and downloading duplicated file using different file information}

도 1은 종래 중복된 대용량 파일을 사용자별로 저장하는 경우를 보여주는 예시도이다.1 is an exemplary view showing a case where a conventional duplicated large file is stored for each user.

도 2는 본 발명의 일 실시예에 따른 다수의 동일한 파일이 하나의 파일로 저장되는 예시도이다.2 is an exemplary diagram in which a plurality of identical files are stored as one file according to an embodiment of the present invention.

도 3은 본 발명의 일 실시예에 따른 클라이언트와 파일 서버의 구성을 보여주는 예시도이다. 3 is an exemplary view showing a configuration of a client and a file server according to an embodiment of the present invention.

도 4는 본 발명의 일 실시예에 따른 두 파일의 일부 중복을 처리하는 과정을 보여주는 예시도이다.4 is an exemplary view illustrating a process of processing some duplication of two files according to an embodiment of the present invention.

도 5는 본 발명의 일 실시예에 따른 파일 서버에 중복된 파일의 상이한 부분들이 저장되는 예시도이다.5 is an exemplary diagram in which different parts of a duplicate file are stored in a file server according to an embodiment of the present invention.

도 6은 본 발명의 일 실시예에 따른 파일 서버가 클라이언트로부터 파일을 수신하는 과정을 보여주는 순서도이다.6 is a flowchart illustrating a process in which a file server receives a file from a client according to an embodiment of the present invention.

도 7은 본 발명의 일 실시예에 따른 클라이언트와 서버간의 작업을 보여주는 시퀀스 다이어그램이다.7 is a sequence diagram showing operations between a client and a server according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

500: 클라이언트 520: 세그먼트 생성부500: client 520: segment generator

530: 체크파일 생성부 950: 파일 정보부530: check file generation unit 950: file information unit

960: 파일저장부 1000: 파일 서버960: file storage unit 1000: file server

최근 멀티미디어 기술의 증가와 대용량 저장매체의 상용화로 멀티미디어 파일의 크기와 디지털 장치의 저장 용량은 꾸준히 증대되고 있다. 또한 과거에 파일을 개인의 디지털 장치에 저장하는 방식에서 점차 탈피하여, 대용량 저장 공간을 제공하는 회사의 저장 장치를 사용하는 방식이 자리잡아 가고 있다. 그러나 대용량의 저장 공간에 저장되는 많은 멀티미디어 파일들은 중복되는 경우가 허다하다. 특히, 멀티미디어 파일의 경우에는 사용자가 파일을 조작하거나 변경할 가능성이 낮으며, 유사한 시점에 동일한 멀티미디어 파일이 각기 다른 사용자들에 의해 저장될 가능성이 높다는 점에서 저장 공간의 효율을 떨어뜨린다.Recently, with the increase of multimedia technology and the commercialization of mass storage media, the size of multimedia files and the storage capacity of digital devices are steadily increasing. In addition, in the past, the method of using a company's storage device that provides a large amount of storage space is gradually moving away from storing the file on a personal digital device. However, many multimedia files stored in a large storage space are often duplicated. In particular, in the case of a multimedia file, the user is unlikely to manipulate or change the file, which reduces the storage space efficiency in that the same multimedia file is likely to be stored by different users at similar time points.

사용자가 저장하는 개인용 디지털 저장 장치에 중복된 파일을 찾아주는 프로그램은 현재 많이 유통중이다. 이는 개인용 컴퓨터에 저장되는 파일의 유형을 검사하여 중복된 파일을 찾아내는 방식으로, 개별 파일을 일대일로 찾아서 검사한다.Programs that find duplicate files in personal digital storage devices that users store are currently in circulation. It checks for the type of files stored on your personal computer and finds duplicate files.

그러나, 다양한 사용자가 파일을 저장하는 스토리지 서비스(storage service)에서는 수많은 개별 파일을 일일이 검색하여 중복을 찾는 것은 엄청난 시간을 필요로 하므로, 실제 스토리지 서비스를 제공하는 파일 저장 서버에 이를 적용하기에는 무리가 있다. 또한 스토리지 서비스를 제공하는 파일 저장 서버는 하나의 컴퓨터에 구성되는 것이 아니라, 다수의 대용량 저장 장치를 가지는 컴퓨터들을 연결하여 구성하기 때문에, 각 저장 장치를 다 검사한다는 것은 불가능하다.However, in the storage service, where various users store files, searching through a large number of individual files and searching for duplicates takes tremendous time, so it is difficult to apply them to a file storage server that provides a real storage service. . In addition, since the file storage server providing the storage service is not configured on a single computer, but is configured by connecting computers having a plurality of mass storage devices, it is impossible to inspect each storage device.

스토리지 서비스를 제공함에 있어서 중복된 파일을 저장하는 것은 비단 저장 공간의 낭비만을 초래하는 것이 아니다. 도 1에서는 중복된 대용량 파일을 저장하는 경우 저장 공간과 네트워크의 효율이 떨어지는 경우를 보여준다.In providing storage services, storing duplicate files is not only a waste of storage space. FIG. 1 shows a case in which duplicated large files are stored, and storage efficiency and network efficiency are reduced.

파일 서버(1000)를 사용하는 사용자(10, 20, 90)이 동일한 파일인 Movie_nemo.avi를 업로드하고 있다. 통상적으로 멀티미디어 파일의 경우 특정 시점에 동일한 컨텐츠에 대한 관심이 집중되므로, 같은 컨텐츠를 가지는 다수의 멀티미디어 파일이 파일 서버(1000)에 저장되는 경우가 많다. 그 결과, 파일 서버(1000)에는 동일한 파일(201, 202, 209)가 저장됨으로써, 불필요한 공간이 낭비되며, 동일한 파일을 중복하여 업로드하기 때문에 네트워크의 부하가 커진다. A user 10, 20, or 90 using the file server 1000 uploads the same file Movie_nemo.avi. In general, in the case of a multimedia file, attention is focused on the same content at a specific point in time, so that a plurality of multimedia files having the same content are often stored in the file server 1000. As a result, since the same files 201, 202, and 209 are stored in the file server 1000, unnecessary space is wasted, and since the same files are repeatedly uploaded, the network load is increased.

따라서, 대용량 파일 서버에 업로드 되는 파일의 중복을 피하고, 중복되는 파일의 저장을 효율적으로 처리하여, 저장 공간의 낭비와 네트워크의 부하를 줄이는 방법이 필요하다.Therefore, there is a need for a method of avoiding duplication of files uploaded to a large file server, efficiently processing duplicate files, and reducing storage space and network load.

본 발명이 이루고자 하는 기술적 과제는 파일의 중복되는 부분은 저장하지 않음으로써 저장 공간의 효율을 높이는데 있다.The technical problem to be achieved by the present invention is to increase the efficiency of the storage space by not storing the overlapping portion of the file.

본 발명이 이루고자 하는 또다른 기술적 과제는 중복된 파일의 업로드를 제한하여 네트워크의 부하를 줄이는데 있다.Another technical problem to be achieved by the present invention is to reduce the load on the network by limiting the upload of duplicate files.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects that are not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명의 다른 실시예에 따른 파일의 정보를 이용하여 중복된 파일을 저장하는 방법은 클라이언트로부터 제 1 파일그룹의 적어도 일부 파일에 대한 체크데이 터를 수신하는 단계, 상기 수신한 체크데이터와 동일한 체크데이터를 가지는 제 2 파일그룹을 검색하는 단계, 상기 제 2 파일그룹이 존재하는 경우, 상기 클라이언트로부터 상기 제 1 파일그룹의 제 1 파일과 상기 제 2 파일그룹의 제 2 파일을 비교하는 단계 및 상기 제 2 파일과 동일한 상기 제 1 파일은 상기 제 2 파일을 참조하고, 상기 제 2 파일과 동일하지 않은 상기 제 1 파일은 저장하는 단계를 포함한다.According to another aspect of the present invention, there is provided a method of storing a duplicate file using information of a file, the method comprising: receiving check data of at least some files of a first filegroup from a client, the same check as the received check data Retrieving a second filegroup having data, comparing the first file of the first filegroup with the second file of the second filegroup from the client if the second filegroup exists; and The first file, which is identical to the second file, refers to the second file, and the first file, which is not the same as the second file, is stored.

본 발명의 일 실시예에 따른 파일의 정보를 이용하여 중복된 파일을 다운로드하는 방법은 클라이언트로부터 다운로드할 제 1 파일에 대한 정보를 수신하는 단계, 상기 제 1 파일과 동일한 부분을 가지는 제 2 파일이 존재하는지 검색하는 단계, 상기 제 2 파일이 존재하는 경우, 상기 제 1 파일과 상기 제 2 파일의 동일한 부분은 상기 제 2 파일에서 추출하여 상기 클라이언트에 송신하고, 상기 제 1 파일과 상기 제 2 파일의 상이한 부분은 상기 제 1 파일에서 추출하여 상기 클라이언트에 송신하는 단계를 포함한다.According to an embodiment of the present invention, a method of downloading a duplicate file by using information of a file may include receiving information about a first file to be downloaded from a client, wherein the second file having the same portion as the first file may be selected. Searching for existence, if the second file exists, the same part of the first file and the second file is extracted from the second file and sent to the client, and the first file and the second file The different portion of the includes extracting from the first file and sending it to the client.

본 발명의 일 실시예에 따른 파일 서버는 클라이언트로부터 제 1 파일의 적어도 일부에 대한 체크데이터를 수신하는 송수신부, 상기 수신한 체크데이터와 동일한 체크데이터를 가지는 제 2 파일을 검색하며, 상기 수신한 체크데이터를 저장하는 파일 정보부 및 상기 제 2 파일이 존재하는 경우, 상기 클라이언트로부터 상기 제 1 파일과 상기 제 2 파일을 비교하여 동일한 부분과 상이한 부분을 추출하여 상기 동일한 부분은 상기 제 2 파일을 참조하도록 저장하며, 상기 상이한 부분은 상기 제 1 파일의 일부를 사용하여 저장하는 파일저장부를 포함한다.The file server according to an embodiment of the present invention is a transceiver for receiving check data for at least a portion of a first file from a client, searching for a second file having the same check data as the received check data, and receiving the received data. If the file information unit storing the check data and the second file exist, comparing the first file and the second file from the client and extracting the same and different parts, the same part refers to the second file. And the different portion includes a file storage portion for storing using a portion of the first file.

설명에 앞서 본 명세서에서 사용하는 용어의 의미를 간략히 설명한다. 그렇 지만 용어의 설명은 본 명세서의 이해를 돕기 위한 것으로서 명시적으로 본 발명을 한정하는 사항으로 기재하지 않은 경우에 본 발명의 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.Prior to the description, the meaning of terms used in the present specification will be briefly described. Nevertheless, it should be noted that the terminology is used to limit the technical spirit of the present invention unless it is explicitly described as limiting the present invention as an explanation of the present specification.

- CRC (Cyclic Redundancy Checking)-CRC (Cyclic Redundancy Checking)

CRC는 통신 링크로 전송되어온 데이터 내에 에러가 있는지 확인하기 위한 방법 중의 하나이다. 송신장치는 전송될 데이터 블록에 16 비트 또는 32 비트 다항식을 적용하여, 그 결과로 얻어진 코드를 그 블록에 덧붙인다. 수신측에서는 데이터에 같은 다항식을 적용하여 그 결과를 송신측이 보내온 결과와 비교한다. 만약 두 개가 일치하면, 그 데이터는 성공적으로 수신된 것이며, 그렇지 않은 경우 그 데이터 블록을 재송신하도록 송신측에게 요구한다. ITU-T는 송신블록에 부가될 코드를 얻는데 사용되는 16 비트 다항식에 대한 표준을 제정했다. 16 비트 CRC는 두 개의 비트가 동시에 에러가 난 경우를 포함하여, 일어날 수 있는 모든 에러에 대하여 99.998% 검출을 보장한다. 이 정도의 검출보증은 4 KB 이하의 데이터 블록 전송에는 충분한 것으로 평가되고 있으며, 그 이상의 대량 전송에는 32 비트 CRC가 사용된다. 이더넷(Ethernet)과 토큰링(Token ring) 프로토콜에서도 모두 32 비트 CRC를 사용한다. 다소 덜 복잡하고 에러 검출능력도 다소 떨어지는 방법으로서 체크섬(Checksum) 방식이 있다. 체크섬 방식은 수신자가 같은 수의 비트가 도착했는지를 확인 할 수 있도록 전송단위 내의 비트 수를 세는 것이다. 만약 계산이 맞으면, 에러없이 원만하게 수신된 것으로 간주된다. TCP와 UDP 통신계층 모두에서 체크섬 계산 및 검증 서비스가 제공된다.CRC is one of the methods for checking if there is an error in the data transmitted on the communication link. The transmitter applies a 16-bit or 32-bit polynomial to the block of data to be transmitted, and adds the resulting code to the block. The receiver applies the same polynomial to the data and compares the result with that sent by the sender. If the two match, the data has been successfully received, otherwise the sender is requested to retransmit the data block. The ITU-T has established a standard for 16-bit polynomials used to obtain codes to be added to transmission blocks. The 16-bit CRC guarantees 99.998% detection for all possible errors, including when two bits fail at the same time. This degree of detection assurance is estimated to be sufficient for data block transmission of 4 KB or less, and 32-bit CRC is used for further large-scale transmission. Both the Ethernet and Token ring protocols also use 32-bit CRCs. A slightly less complicated and less error-prone method is the checksum method. The checksum method counts the bits in a transmission unit so that the receiver can verify that the same number of bits have arrived. If the calculation is correct, it is considered smoothly received without errors. Checksum calculation and verification services are provided at both the TCP and UDP communication layers.

- 파일- file

파일은 통상적으로 데이터를 다루는 단위가 된다. 따라서 하나의 멀티미디어 컨텐츠, 하나의 저작물등은 통상 하나 이상의 파일로 구성된다. 본 명세서에서는 주된 실시예로 하나의 파일이 하나의 의미있는 데이터로 존재하는 경우를 나타내고 있으나 이에 한정되는 것은 아니다. 후술할 세그먼트가 하나의 파일로 존재할 수 있으며, 특히 파일 서버의 경우에는 원래 하나였던 파일을 중복 여부를 검토할 수 있게 일정 크기 또는 일정 기준에 따라 여러 개의 세그먼트로 나누어 저장할 수 있는데, 이 세그먼트가 각각 파일로 저장될 수 있다. 따라서 파일 서버는 하나의 파일로 저장하는 경우와 여러 개의 세그먼트로 나뉘어진 여러 파일을 저장하는 경우 모두를 포함한다.A file is usually a unit for handling data. Therefore, one multimedia content, one copyrighted work, etc. are usually composed of one or more files. In the present specification, as a main embodiment, one file exists as one meaningful data, but is not limited thereto. Segments to be described later may exist as a single file. Especially, in the case of a file server, a file that was originally one may be divided into several segments according to a predetermined size or a predetermined criterion to check whether a file is duplicated. Can be saved as a file. Therefore, a file server includes both a single file and a multi-segment file.

- 체크파일-Check file

체크파일은 체크데이터의 일종이다. 체크데이터는 파일 또는 파일의 일부에 대해 CRC, 체크섬을 통해 얻을 수 있으며, 또한 파일의 크기, 파일명, 파일의 유형등을 포함할 수 있다. 체크파일은 체크데이터를 파일서버에서 파일 형태로 저장하는 것을 의미한다. 파일 형태로 저장될 경우, 파일 서버는 업로드시 체크파일을 체크데이터로 하여 파일의 동일 여부를 검토할 수 있다. 본 명세서에서는 체크데이터가 데이터 베이스에 일부 저장되고, 또 일부는 파일의 형태로 저장되므로 체크데이터의 일 실시예로 체크파일이라 명명한다. 그러나, 본 발명은 이에 한정되는 것이 아니며, 체크데이터는 파일의 형태로 저장되지 않고 매 업로드 순간마다 계산하는 방식으로 생성될 수도 있다. 또한 데이터베이스에 저장되어 검색할 수 있다.The check file is a kind of check data. Check data can be obtained through CRC, checksum for a file or part of a file, and can also include the file size, file name, file type, and so on. Check file means to save the check data in the form of a file in the file server. When stored in the form of a file, the file server may check whether the files are identical by using the check file as the check data when uploading. In the present specification, since the check data is partially stored in the database, and part of the check data is stored in the form of a file, the check data is referred to as an embodiment of the check data. However, the present invention is not limited thereto, and the check data may not be stored in the form of a file but may be generated by calculating at every upload moment. It can also be stored in a database for searching.

파일 서버(1000)에 다수의 사용자가 Movie_nemo.avi 파일을 업로드 하고 있다. 여기서 어느 한 사용자의 Movie_nemo.avi 파일(210)만을 저장하고, 다른 사용자에게는 그 파일이 저장된 것으로 통지한다. 그리고 파일 서버(1000)에 저장된 Movie_nemo.avi 파일을 N 명의 사용자가 저장하였음을 나타내기 위한 카운터(291)를 사용한다. 이후, N 명의 사용자중 K 명이 이 파일의 삭제를 요청하면, 파일 서버(1000)는 파일을 직접 삭제하는 것이 아니라 카운터(291)를 N-K로 변경한다. 이 숫자가 0이 되면 더 이상 이 파일을 저장한 사용자가 없으므로 파일 서버(1000)에서 삭제한다. 동일한 파일을 다시 누군가가 저장하려 한다면, 카운터(291)를 1 증가시킨다.Many users upload the Movie_nemo.avi file to the file server 1000. Here, only one user's Movie_nemo.avi file 210 is stored, and the other user is notified that the file is stored. A counter 291 is used to indicate that N users have stored a Movie_nemo.avi file stored in the file server 1000. Thereafter, when K of N users requests deletion of the file, the file server 1000 does not delete the file directly, but changes the counter 291 to N-K. If this number is zero, there is no user who has stored this file anymore, so it is deleted from the file server 1000. If someone tries to save the same file again, increment the counter 291 by one.

도 3은 본 발명의 일 실시예에 따른 클라이언트와 파일 서버의 구성을 보여주는 예시도이다.3 is an exemplary view showing a configuration of a client and a file server according to an embodiment of the present invention.

본 실시예에서 사용되는 '~부'라는 용어, 즉 '~모듈' 또는 '~테이블' 등은 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, 모듈은 어떤 기능들을 수행한다. 그렇지만 모듈은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. 모듈은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 모듈은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시 저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다. 구성요소들과 모듈들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 모듈들로 결합되거나 추가적인 구성요소들과 모듈들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 모듈들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.The term '~ part' used in this embodiment, that is, '~ module' or '~ table' means a hardware component such as software, FPGA or ASIC, and the module performs certain functions. However, modules are not meant to be limited to software or hardware. The module may be configured to be in an addressable storage medium and may be configured to play one or more processors. Thus, as an example, a module may include components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, subroutines, and the like. , Segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within the components and modules may be combined into a smaller number of components and modules or further separated into additional components and modules. In addition, the components and modules may be implemented to play one or more CPUs in a device or secure multimedia card.

클라이언트(500)는 전술한 사용자의 컴퓨터(10, 20, 90)에 설치되며, 파일의 업로드, 다운로드를 수행한다. 클라이언트(500)는 크게 파일과 데이터를 송수신하는 송수신부(510), 하나의 파일을 일정 크기 또는 일정 기준에 따라 세그먼트로 나누는 세그먼트 생성부(520), 그리고 하나의 파일에 대해, 또는 세그먼트에 대해 체크파일을 생성하는 체크파일 생성부(530), 그리고 파일이 저장되어 있는 파일 저장부(560)를 포함한다.The client 500 is installed in the above-described user's computer 10, 20, 90, and uploads and downloads a file. The client 500 includes a transceiver 510 for largely transmitting and receiving files and data, a segment generator 520 for dividing one file into segments according to a predetermined size or a predetermined criterion, and one file or a segment. A check file generating unit 530 for generating a check file, and a file storage unit 560 in which the file is stored.

송수신부(510)는 파일 서버와 데이터 및 파일을 송수신한다. 유, 무선의 네트워크를 통해 파일을 송수신한다. 상기 송수신되는 파일은 하나의 파일일 수 있고, 또는 파일을 나눈 세그먼트일 수 있다.The transceiver 510 transmits and receives data and files to and from a file server. Send and receive files over wired and wireless networks. The file to be transmitted and received may be one file or may be a segmented file.

세그먼트 생성부(520)는 파일을 일정 길이 또는 일정 기준에 따라 나눈다. 이는 파일의 중복을 체크하기 위해 파일 전체에 대해서 수행하는 것은 시간이 걸릴 수 있으므로, 파일을 일정 크기만큼 나눌 수 있기 때문이다. 대용량의 파일이 이미 일정 크기로 나뉘어진 경우에는 세그먼트 생성부(520)을 거치지않고 전송될 수 있다.The segment generator 520 divides the file according to a predetermined length or a predetermined criterion. This is because it may take time to perform the entire file to check for duplicate files, so that the file can be divided by a certain size. If a large file is already divided into a predetermined size, it may be transmitted without going through the segment generator 520.

체크파일 생성부(530)는 파일 또는 파일의 일부인 세그먼트에서 정보를 추출하여 파일 또는 세그먼트의 동일 여부를 검토할 수 있게 하는 체크파일을 생성한다. 체크파일의 생성 방법으로는 전술한 CRC 방법, 또는 체크섬(Checksum) 방법이 될 수 있다. 이외에도, 파일 또는 세그먼트에 대한 비트의 정보, 사이즈 등으로 체크파일을 만들 수 있다.The check file generating unit 530 extracts information from a file or a segment that is a part of the file, and generates a check file for examining whether the file or the segment is identical. The generation method of the check file may be the above-described CRC method or checksum method. In addition, a check file can be created based on the information, size, and the like of a file or a segment.

파일저장부(560)는 업로드할 파일이 저장되거나 또는 다운로드할 파일을 저장할 매체를 의미한다. 대용량의 파일의 저장이 하드디스크를 통해 이루어졌으나, 최근, 플래쉬 메모리, USB 저장장치 등으로 다양해지고 있다.The file storage unit 560 means a medium in which a file to be uploaded is stored or a file to be downloaded is stored. Storage of large files is made through a hard disk, but in recent years, it has been diversified into a flash memory, a USB storage device, and the like.

클라이언트(500)는 파일 서버(1000)에 파일을 업로드 하기전에 업로드할 파일의 정보를 파일 서버(1000)에 전송한다. 전송할 정보로는 파일의 동일성을 검토할 수 있는 것으로서, 예를 들어, 파일 또는 파일의 첫번째 세그먼트에 대한 체크파일과 파일의 사이즈, 파일명 등이 될 수 있다. 멀티미디어 컨텐츠가 하나의 파일로 존재한다면 세그먼트 생성부(520)에서 세그먼트로 나눈후, 체크파일 생성부(530)에서 체크파일을 생성한다. 송수신부(510)는 파일의 정보, 예를 들어 체크파일, 파일의 사이즈, 파일명 등을 파일 서버(1000)로 전송하여 파일 서버(1000)에 동일한 파일이 저장되어 있는지 검토할 수 있게 한다. 파일 서버(1000)는 수신한 정보들을 이용하여, 완전히 동일한 파일인 경우에는 파일을 수신하지 않고, 업로드하고자 한 파일에 대한 정보만을 저장한다. 일부가 상이한 파일인 경우에는 상이한 부분만 수신하도록 클라이언트(500)에 요청한다. 상이한 부분을 검토하는 방법은 후술하고자 한다.The client 500 transmits the information of the file to be uploaded to the file server 1000 before uploading the file to the file server 1000. As information to be transmitted, the identity of a file can be examined. For example, the file can be a check file, a file size, a file name, etc. for the file or the first segment of the file. If the multimedia content exists as a single file, the segment generator 520 divides the segment into segments, and then the check file generator 530 generates a check file. The transmitter / receiver 510 transmits file information, for example, a check file, a file size, a file name, and the like, to the file server 1000 so that the same file is stored in the file server 1000. Using the received information, the file server 1000 stores only the information on the file to be uploaded without receiving the file when the file is the same file. If the part is a different file, the client 500 is requested to receive only the different part. The method of examining the different parts will be described later.

파일 서버(1000)는 송수신부(910), 세그먼트 생성부(920), 체크파일 생성부(930), 파일 정보부(950), 파일저장부(960)을 포함한다.The file server 1000 includes a transceiver 910, a segment generator 920, a check file generator 930, a file information unit 950, and a file storage unit 960.

송수신부(910)는 클라이언트(500)와 파일과 데이터를 송수신한다. 클라이언트(500)와는 유, 무선의 네트워크를 통해 데이터와 파일을 송수신할 수 있다. 세그먼트 생성부(920)는 클라이언트(500)가 파일을 전송하는 경우, 이를 세그먼트로 생성하여 저장하는 작업을 수행한다. 클라이언트(500)가 업로드하는 파일과 중복되는 파일이 존재하지 않을 경우, 클라이언트(500)는 파일을 세그먼트로 나누지 않고 그대로 전송할 수 있다. 수신한 파일을 원본으로 중복 여부를 검토할 수 있도록 세그먼트 생성부는 수신한 파일을 세그먼트로 나눈다. The transceiver 910 transmits and receives files and data with the client 500. The client 500 may transmit and receive data and files via a wired or wireless network. When the client 500 transmits a file, the segment generator 920 generates and stores the segment as a segment. If there is no file duplicated with the file uploaded by the client 500, the client 500 may transmit the file without dividing the file into segments. The segment generator divides the received file into segments so that the received file can be reviewed as a duplicate.

체크파일 생성부(930)는 각 세그먼트에 대해 체크파일을 생성한다. 파일 서버(1000)에 중복된 파일이 저장되지 않은 경우에는 모든 세그먼트에 대해 체크파일을 생성할 수 있다. 한편, 중복된 파일이 존재한다면, 그 상이한 부분에 해당하는 세그먼트만 저장하고, 이 세그먼트에 대한 체크파일을 생성한다. 이후 사용자가 파일을 다운로드하고자 할 경우, 원본 파일의 세그먼트와 원본 파일과 상이한 세그먼트를 구별하여 다운로드 받을 수 있다. 여기에 대해서는 도 4에서 자세히 살펴보고자 한다.The check file generator 930 generates a check file for each segment. When duplicate files are not stored in the file server 1000, check files may be generated for all segments. On the other hand, if there is a duplicate file, only the segments corresponding to the different parts are stored, and a check file for the segment is generated. After that, when the user wants to download the file, the user can distinguish the segment of the original file from the segment different from the original file. This will be described in detail with reference to FIG. 4.

파일 정보부(950)는 어떤 파일이 어떤 정보를 가지고 있는지에 대한 정보를 저장한다. 파일의 체크파일의 일부를 포함할 수 있고, 파일의 사이즈, 파일명 등 파일의 동일성을 검토하는데 필요한 정보를 포함할 수 있다. 또한, 원본 파일과 이 파일과 동일하거나 일부 상이한 파일이, 이 원본 파일에 기하여 업로드 된 경우에 는 원본 파일을 찾을 수 있게 하고, 원본 파일을 참조하는 다른 파일이 존재하는 경우에는 원본 파일이 삭제되지 않도록 제어할 수 있다. 파일에 대한 정보는 데이터베이스(database), 정보 파일 등을 이용해 저장할 수 있다.The file information unit 950 stores information about which file has what information. The file may include a part of the check file of the file, and may include information necessary for examining the identity of the file such as the file size and the file name. Also, if the original file and the same or some different file are uploaded based on this original file, the original file can be found. If there is another file that refers to the original file, the original file will not be deleted. Can be controlled. Information about files can be stored using databases, information files, and so on.

파일저장부(960)는 파일을 저장하는 저장 매체이다. 대용량의 파일을 저장하기에 적합한 매체이며, 파일저장부(960)는 여러 시스템에 분산되어 설치될 수 있다. 파일이 어느 시스템의 파일저장부(960)에 저장되었는지를 알기 위한 정보를 파일 정보부(950)가 제공할 수 있다.The file storage unit 960 is a storage medium for storing a file. It is a medium suitable for storing a large amount of files, and the file storage unit 960 may be distributed and installed in various systems. The file information unit 950 may provide information for knowing which file is stored in the file storage unit 960 of the system.

도 3의 실시예에서 파일 서버와 클라이언트는 모두 체크파일 생성부를 가지고 이으며, 체크파일을 저장하는 방식을 제시하고 있다. 그러나 이는 일 실시예에 해당하며, 체크데이터를 업로드, 다운로드시 생성하여 비교할 수 있다. 다만, 업로드 속도를 고려하여 체크데이터를 파일의 형태로 저장하는 것이 바람직하며, 또한 파일의 형태가 아니라도, 데이터베이스와 같은 검색이 용이한 부분을 사용할 수도 있다.In the embodiment of FIG. 3, both the file server and the client have a check file generator, and present a method of storing the check file. However, this corresponds to an embodiment, and check data can be generated and compared when uploading or downloading. However, it is preferable to store the check data in the form of a file in consideration of the upload speed, and it is also possible to use a searchable part such as a database even if the file is not in the form of a file.

도 4는 본 발명의 일 실시예에 따른 두 파일의 일부 중복을 처리하는 과정을 보여주는 예시도이다. 4 is an exemplary view illustrating a process of processing some duplication of two files according to an embodiment of the present invention.

멀티미디어 파일이 모두 동일하다면, 하나의 파일만을 업로드하여 몇 명의 사용자가 이 파일을 저장하였는지를 데이터베이스 또는 파일등의 자료를 통해 관리할 수 있다. 그런데, 파일이 동일한지 여부를 알기 위해서는 해당 파일의 각 비트를 검사하는 방법이 있을 수 있고, 해당 파일을 나타내는 정보를 비교하여 파일의 동일 여부를 검사하는 방법이 있다. 해당 파일의 각 비트를 검사하는 것은 멀티미 디어 파일의 경우 그 크기로 인해, 비교하는데 소요되는 시간이 크다.If the multimedia files are all the same, only one file can be uploaded to manage how many users have stored this file through data such as a database or a file. However, in order to know whether a file is the same, there may be a method of checking each bit of the file, and there is a method of checking whether the file is the same by comparing information representing the file. Examining each bit of the file is time consuming due to its size in the case of multimedia files.

따라서 해당 파일을 나타내는 정보를 비교하는 것이 더 적은 시간을 필요로 할 수 있다. 해당 파일을 나타내는 정보에는 파일의 크기, 파일명, 또는 파일의 특정 부분이거나, 해당 특정 부분을 구성하는 비트들의 패턴 또는 CRC 체크값이 될 수 있다.Therefore, comparing the information that represents the file may take less time. The information indicating the file may be a file size, a file name, or a specific part of the file, or a pattern of bits or a CRC check value constituting the specific part.

도 4에서는 파일의 비트들의 패턴등을 체크하여 이들의 결과값을 비교하여 파일의 동일성을 검토한다. 101은 사용자 1이 올리려는 파일이며, 개인용 컴퓨터 또는 디지털 장치 등에 저장되어 있다. 이 파일은 파일 중복 체크에서 중복되지 않는 파일로 검토되어 파일 서버에 원본 파일로 저장된다. 저장되는 방식은 하나의 파일이 하나 이상의 조각들(segment, 이하 '세그먼트'라 한다)(201, 202, 203 등)로 나뉘어져 저장된다. 이 세그먼트(201, 202, 203 등)는 본 발명의 일 실시예에 따라 파일을 일정 크기로 나누어진 것을 의미한다. 각 세그먼트는 파일 서버에 저장된다. 그리고 각 세그먼트를 나타내는 체크파일(301, 302 등)을 더 생성할 수 있다. 체크파일은 해당 세그먼트의 특성을 나타내며, 두 세그먼트를 구별하는 기준이 된다. 도 4에 나타난 바와 같이 체크파일1(301)은 세그먼트1(201)의 특성을 나타내는 파일이며, 체크파일2(302)는 세그먼트2(202)의 특성을 나타내는 파일이다. 통상적으로 체크파일은 CRC, 체크섬 방식 등을 통해 생성된다. In Fig. 4, the pattern of the bits of the file is checked and the result values thereof are compared to examine the identity of the file. 101 is a file to be uploaded by User 1, and is stored in a personal computer or a digital device. This file is reviewed as a non-duplicate file in the file duplicate check and stored as an original file on the file server. The way in which the file is stored is divided into one or more segments (hereinafter, referred to as 'segments') (201, 202, 203, etc.). These segments 201, 202, 203, etc. mean that the file is divided into predetermined sizes according to one embodiment of the present invention. Each segment is stored on a file server. And check files (301, 302, etc.) indicating each segment can be further generated. The checkfile shows the characteristics of the segment and is the basis for distinguishing the two segments. As shown in FIG. 4, check file 1 301 is a file representing the characteristics of segment 1 201, and check file 2 302 is a file representing the characteristics of segment 2 202. In general, a check file is generated through a CRC, a checksum method, or the like.

다음 사용자인 사용자 2가 102 파일을 업로드하고자 한다. 102 파일이 서버에 저장된 다른 파일과 동일한지 여부를 검토하기 위해, 102 파일의 첫번째 세그먼트(211)의 체크파일(311)을 파일 서버에 전송한다. 파일 서버는 수신한 체크파일 (311)과 저장된 체크파일(301)을 비교한 결과 두 세그먼트(201, 211)가 같은 세그먼트임을 알게 된다. 또한 102파일의 크기가 사용자 1이 업로드한 파일의 사이즈와 유사하거나 유사한 경우, 두 파일은 중복될 가능성이 크다. 따라서 201, 202 등의 세그먼트로 구성된 101 파일을 원본 파일로 하여 중복되지 않는 부분만을 저장한다.The next user, User 2, wants to upload 102 files. To check whether the 102 file is the same as another file stored in the server, the check file 311 of the first segment 211 of the 102 file is sent to the file server. The file server compares the received check file 311 and the stored check file 301 and finds that the two segments 201 and 211 are the same segment. Also, if the size of the 102 file is similar or similar to the size of the file uploaded by User 1, the two files are likely to be duplicated. Therefore, 101 files composed of segments such as 201 and 202 are used as original files, and only the non-overlapping portions are stored.

서버는 201 세그먼트와 동일한 211 세그먼트는 저장할 필요가 없으므로, 201 세그먼트를 수신하지 않고, 다음 세그먼트(212)의 체크파일(312)과 저장된 체크파일(302)를 비교하게 된다. 역시 동일한 결과가 나오므로 두번째 세그먼트(212)도 저장하지 않는다. 세번째 세그먼트(213)의 체크파일(313)을 수신하여 검토한 결과, 두 체크파일(303, 313)이 다르다는 것을 알게 된다. 이는 두 세그먼트(203, 213)가 다르다는 것을 의미하므로, 213 세그먼트를 수신하여 저장한다. 그리고 213 세그먼트에 대한 체크파일(313)을 생성한다. 이후 102 파일의 각 세그먼트들을 전술한 방법으로 비교하여, 체크파일이 다른 경우만 세그먼트를 저장하여 중복된 세그먼트의 저장을 피하여 효율을 높일 수 있다.Since the server does not need to store the 211 segment that is the same as the 201 segment, the server does not receive the 201 segment and compares the checkfile 312 and the stored checkfile 302 of the next segment 212. Again, the same result is obtained, so the second segment 212 is not stored. As a result of receiving and examining the check file 313 of the third segment 213, it is found that the two check files 303 and 313 are different. This means that the two segments 203 and 213 are different and thus receive and store the 213 segment. Then, a check file 313 for the 213 segment is generated. Thereafter, by comparing the segments of the 102 file by the above-described method, the segment can be stored only when the check file is different, thereby avoiding the storage of the duplicated segment, thereby improving efficiency.

이후, 사용자 2가 102 파일을 다시 다운로드하고자 할때에는, 201, 202 등의 세그먼트 파일을 원본으로 하여 다운로드 받으며, 또한 원본 파일과 상이한 부분으로 저장한 세그먼트(213)도 다운로드 받게 된다.Then, when the user 2 wants to download the 102 file again, the user downloads the segment file such as 201 or 202 as an original, and also downloads the segment 213 stored in a different part from the original file.

도 4에서는 상이한 부분과 이에 대한 체크파일을 저장하고 있다. 한편 동일한 부분, 예를 들어 세그먼트 1(211), 세그먼트 2(212)의 경우는 원본의 세그먼트인 세그먼트 1(201), 세그먼트 2(202)를 참조해야 하므로, 이 정보를 저장할 수 있 다. 참조 정보는 파일형태로 저장가능하며, 데이터 베이스에 저장할 수도 있다.4 stores different parts and check files thereof. On the other hand, the same part, for example, segment 1 (211), segment 2 (212) should refer to the original segment segment 1 (201), segment 2 (202), so this information can be stored. Reference information can be stored in a file format or stored in a database.

도 4의 실시예에서는 클라이언트가 파일을 업로드하기전에 체크파일을 송신하여, 서버에 저장된 다른 체크파일과 비교하게 된다. 체크파일은 그 체크파일이 나타내는 세그먼트 파일보다 크기가 작으므로, 네트워크를 통해 송수신되는 데이터의 양이 상대적으로 적다. 또한, 동일하거나 일부 세그먼트만이 다른 경우에는, 체크파일만을 지속적으로 송수신하여 비교하고, 그 결과에 따라 상이한 세그먼트만 수신하므로, 중복되는 파일을 업로드 하는 사용자는 업로드 속도가 빠르다. 저장 공간의 효율뿐만 아니라 네트워크 전송량을 절감할 수 있다.In the embodiment of Fig. 4, the client sends a check file before uploading the file and compares it with other check files stored in the server. Since the check file is smaller than the segment file represented by the check file, the amount of data transmitted and received through the network is relatively small. In addition, when the same or only some segments are different, only the check file is continuously transmitted and received and compared, and only different segments are received according to the result, so that a user uploading a duplicate file has a high upload speed. Not only the efficiency of storage space but also network transmission can be saved.

도 4의 실시예에서는 하나의 세그먼트가 일정 크기, 예를 들어 64 Kbyte, 또는 128 Kbyte일 수 있고, 여기에 대한 체크파일은 4byte, 또는 8byte 등이 될 수 있다. 체크파일의 크기는 세그먼트의 크기에 따라 가변적이다.In the embodiment of FIG. 4, one segment may be a predetermined size, for example, 64 Kbytes or 128 Kbytes, and the check file may be 4 bytes or 8 bytes. The size of the checkfile varies with the size of the segment.

또한, 세그먼트는 실제 클라이언트에서 세그먼트 단위로 나뉘어 파일이 저장될 수 있으며, 클라이언트가 세그먼트 단위로 나누어 전송할 수 있다. 체크파일 역시 클라이언트가 생성하여 매 세그먼트에 따라 체크데이터의 형태로 송신할 수 있다. 본 명세서에서 파일은 저장 매체에 저장되는 파일의 개념을 포함하며, 또한 메모리의 일정 영역에 데이터로 자리잡는 일련의 비트스트링의 개념을 포함한다.In addition, the segment may be divided into segments in the actual client and the file may be stored, and the client may divide and transmit the segment in segments. The check file can also be generated by the client and sent in the form of check data for each segment. In the present specification, a file includes a concept of a file stored in a storage medium, and also includes a concept of a series of bitstrings that are located as data in a predetermined area of a memory.

또한 사용자의 관점에서 하나의 파일이라 하여도, 이 파일이 여러 개의 세그먼트로 나누어져 각각 파일의 형태로 저장되는 경우도 가능하다. 이 경우, 사용자의 관점에서 하나의 파일로 인식되는 것은 하나의 파일 그룹을 의미하는 것이고, 이 파일 그룹을 구성하는 세그먼트들은 각각 하나의 파일이 될 수 있다. 이 경우, 클라이언트가 전송하는 세그먼트는 하나의 파일이 될 수 있다. 마찬가지로, 파일 서버에 저장되는 세그먼트 역시 독립적인 파일로 존재할 수 있다. 특히 원본 파일을 참조하며 일부 상이한 파일의 경우에는 각 세그먼트들이 독립된 파일로 저장될 수 있다. 이 경우에는 사용자 관점에서 업로드, 다운로드 대상이 되는 것은 파일 그룹이며, 서버와 클라이언트에는 파일 그룹을 구성하는 세부 세그먼트들이 파일로 존재한다.In addition, even if a file is viewed from the user's point of view, the file may be divided into several segments and stored in the form of a file. In this case, what is recognized as one file from the user's point of view means one file group, and segments constituting the file group may each be one file. In this case, the segment transmitted by the client may be one file. Similarly, segments stored on file servers can also exist as independent files. In particular, it refers to the original file, and for some different files, each segment can be stored as a separate file. In this case, files are uploaded and downloaded from the user's point of view, and detailed segments of the file group exist as files in the server and the client.

비교하기 위한 세그먼트를 선택하는 방식은 두 파일에서 동일 위치에 있는 세그먼트를 선택하는 방법과, 특정 영역에서 일정 비트수만큼 전후를 떨어진 위치의 세그먼트를 선택하는 두 가지 방법이 있다. 전자의 경우(동일 위치의 세그먼트 선택)는 파일의 전체 사이즈는 동일하고, 일부 영역이 수정된 경우에 적용 가능하다. 후자의 경우(특정 영역에서 일정 오프셋만큼 떨어진 세그먼트 선택)는 원본 파일과 비교하여 일정 부분이 추가되거나 삭제된 경우에 적용가능하다. 예를 들어, 원본에는 세그먼트 1, 2, 3, 4, 5가 존재하고, 업로드할 대상 파일은 세그먼트 1, 3, 4, 5만 존재하고 세그먼트 2가 삭제된 경우이며 1, 3, 4, 5는 이미 원본 파일에 존재하므로 새로 올릴 필요가 없다. 그러나 동일 위치에 있는 세그먼트를 선택하는 방식이라면, 원본의 세그먼트 2와 업로드 파일의 세그먼트 3을 비교하고, 원본의 세그먼트 3과 업로드 파일의 세그먼트 4를, 그리고 원본 파일의 세그먼트 4와 업로드 파일의 세그먼트 5를 비교하여, 각기 다른 세그먼트로 인식할 수 있다. 따라서, 세그먼트 4, 5를 각각 세그먼트 3에서의 오프셋으로 판단하여 비교할 수 있다.There are two methods for selecting segments for comparison, two segments for selecting a segment at the same position in two files and a segment at a position separated by a predetermined number of bits in a specific region. In the former case (segment selection at the same position), the overall size of the file is the same, and it is applicable when some areas are modified. The latter case (segment selection separated by a certain offset from a specific area) is applicable when certain parts have been added or deleted compared to the original file. For example, the source contains segments 1, 2, 3, 4, and 5, the file to upload is only segments 1, 3, 4, and 5, and segment 2 is deleted. Does not need to be uploaded because it already exists in the original file. However, if you select a segment that is in the same location, compare segment 2 of the original with segment 3 of the uploaded file, compare segment 3 of the original with segment 4 of the uploaded file, and segment 4 of the original file and segment 5 of the uploaded file. By comparing them, we can recognize them as different segments. Therefore, segments 4 and 5 can be judged as the offset in segment 3, respectively, and can be compared.

본 발명의 일 실시예에서는 두 파일을 비교하는 방식으로, 각 세그먼트를 비 교할 수 있으며, 또는 각 세그먼트의 체크파일을 비교할 수도 있다. 물론, 이들 파일들을 직접 비교하여 파일의 동일성을 검토할 수 있으나, 대용량의 멀티미디어 파일에서 파일을 직접 비교하는 경우에 많은 시간이 소요될 가능성이 존재하므로, 체크파일과 같이 파일의 일부 정보를 이용하여 비교하는 것이 바람직하다.In an exemplary embodiment of the present invention, each segment may be compared by comparing two files, or a check file of each segment may be compared. Of course, these files can be directly compared to examine the identity of the files, but there is a possibility that a large amount of time is required when directly comparing files in a large multimedia file. It is desirable to.

체크파일은 전술한 CRC 방식에 따라 생성할 수 있다. CRC 방식 외에도, 파일 또는 세그먼트의 비트를 계산하여 중복되지 않거나 중복될 가능성이 낮은 결과값을 산출할 수 있다. 이러한 값을 포함하는 체크파일이 동일하다면 파일 또는 세그먼트가 동일한 값을 가지는 것으로 판단할 수 있다.The check file may be generated according to the CRC method described above. In addition to the CRC scheme, the bits of a file or segment can be calculated to yield a result that is non-overlapping or unlikely to overlap. If the checkfiles containing these values are the same, it can be determined that the files or segments have the same values.

파일 서버(1000)는 파일 정보부(950)와 파일저장부(960)로 구성된다. 물론 이외에도, 도 3에서 살펴본 바와 같이 파일을 송수신하고, 클라이언트와 통신하기 위한 송수신부가 존재한다.The file server 1000 includes a file information unit 950 and a file storage unit 960. Of course, in addition to the above, as shown in FIG. 3, there is a transceiver for transmitting and receiving a file and communicating with a client.

파일 정보부(950)는 저장된 파일에 대한 정보를 가진다. 파일 식별자, 파일이 저장된 위치, 파일에 대한 체크파일들의 전체 체크파일, 파일의 첫 세그먼트에 대한 체크파일, 파일의 크기, 원본인지 혹은 원본이 아닐 경우의 원본 파일에 대한 정보, 그리고 참조 횟수(카운터) 등을 포함한다. 파일 서버(1000)는 업로드할 파일이 중복된 파일인지를 검토하기 위해 파일 정보부(950)의 정보를 검색할 수 있다. 첫 체크파일이 동일하고, 파일 크기가 동일하거나, 전체 체크파일이 동일하다면, 저장된 파일과 동일하므로, 파일 정보부(950)에는 동일한 파일임을 표시한 후, 파 일을 저장하지 않는다.The file information unit 950 has information about a stored file. The file identifier, the location where the file is stored, the full checkfile of the checkfiles for the file, the checkfile for the first segment of the file, the size of the file, information about the original file if it is original or not original, and the reference count (counter ), And the like. The file server 1000 may search the information of the file information unit 950 to examine whether the file to be uploaded is a duplicate file. If the first check file is the same and the file size is the same or the entire check file is the same as the stored file, the file information unit 950 indicates that the file is the same and does not store the file.

한편, 첫 체크파일과 파일 크기가 동일하지만, 전체 체크파일이 상이한 경우, 두 파일의 일부 세그먼트가 상이함을 의미하므로, 도 4에서 살펴본 바와 같이 상이한 세그먼트만 저장하도록 할 수 있다. 파일 저장부(960)에 나타났듯이, FILE_0002는 FILE_0001과 상이한 세그먼트와 체크파일, 그리고 전체 체크파일이 저장되어 있다. 또한 FILE_0002가 FILE_0001을 원본으로 하는 것을 나타내기 위해 FILE_0001의 카운터 필드를 2로 할 수 있다.On the other hand, if the first check file and the file size are the same, but the entire check file is different, it means that some segments of the two files are different, as shown in Figure 4 can be stored only different segments. As shown in the file storage unit 960, FILE_0002 stores a segment, a check file, and an entire check file different from FILE_0001. In addition, the counter field of FILE_0001 can be set to 2 to indicate that FILE_0002 originates from FILE_0001.

이후 사용자가 FILE_0002를 다운로드 하고자 할 경우, 파일 정보부(950)에서 FILE_0002가 FILE_0001을 원본으로 하는 것을 알게 되므로, 중복되는 부분(세그먼트 1, 세그먼트 3 등)은 FILE_0001에서 다운로드 하고, 상이한 부분(세그먼트 2)은 FILE_0002에서 다운로드 하도록 한다.Afterwards, when the user wants to download FILE_0002, the file information unit 950 learns that FILE_0002 uses FILE_0001 as an original. Therefore, the overlapped portions (segment 1, segment 3, etc.) are downloaded from FILE_0001, and the different portions (segment 2). Should be downloaded from FILE_0002.

한편 FILE_0001의 사용자가 자신의 파일을 삭제할 경우, 파일을 바로 삭제하는 것이 아니라, FILE_0001의 카운터를 1로 하는 것으로 대치한다. 1로 대치하는 이유는 FILE_0002의 사용자가 FILE_0002를 다운로드 받기 위해서는 FILE_0001을 필요로 하기 때문이다. 이후, 사용자가 FILE_0002를 삭제하는 경우, FILE_0002의 카운터를 1 줄여서 0으로 만들고, 그 원본 파일인 FILE_0001의 카운터 역시 1을 줄인 0으로 만든다. 일괄 실행 프로그램, 또는 삭제 프로그램에서는 카운터가 0이 되는 파일을 파일저장부(960)에서 삭제할 수 있다.On the other hand, when the user of FILE_0001 deletes his file, the counter of FILE_0001 is replaced with 1 instead of immediately deleting the file. The reason for replacing 1 is that the user of FILE_0002 needs FILE_0001 to download FILE_0002. After that, when the user deletes FILE_0002, the counter of FILE_0002 is decreased by 1 to 0, and the counter of the original file FILE_0001 is also reduced to 1 by 0. In the batch execution program or the deletion program, the file whose counter is 0 can be deleted from the file storage unit 960.

도 5에서는 FILE_0002가 하나의 파일을 원본 파일로 가지는 경우를 보고 있다. 그러나 이는 본 발명의 일 실시예이며, 둘 이상의 원본 파일을 가질 수 있다. 예를 들어, FILE_0003이 FILE_0002의 두번째 세그먼트와 동일하다면 FILE_0002는 두번째 세그먼트는 FILE_0003을 참조하도록 할 수 있다. 또는 FILE_0003이 FILE_0002와 같이 FILE_0001을 원본으로 하며, FILE_0002와 동일한 파일인 경우에 FILE_0002는 FILE_0001이 아닌 FILE_0003을 원본으로 하여 저장될 수 있다. 이는 일부만 차이나는 여러 유사 파일의 저장시에 적용할 수 있다.In FIG. 5, a case in which FILE_0002 has one file as an original file is shown. However, this is an embodiment of the invention and may have more than one original file. For example, if FILE_0003 is the same as the second segment of FILE_0002, FILE_0002 may have the second segment refer to FILE_0003. Alternatively, if FILE_0003 is the original file such as FILE_0002 and FILE_0001 is the same file as FILE_0002, FILE_0002 may be stored using FILE_0003 as the source rather than FILE_0001. This can be applied when storing several similar files with only a few differences.

파일 서버(1000)는 클라이언트(500)로부터 파일의 업로드 요청을 수신한다(S101). 그리고 클라이언트(500)로부터 업로드할 파일 또는 그 파일의 일부에 대한 체크파일을 수신한다(S102). 체크파일은 전술한 CRC, 체크섬 방식으로 생성된 파일일 수 있으며, 상기 업로드할 파일의 크기에 대한 정보를 포함할 수 있다. 또한, 파일명 또는 컨텐츠의 종류를 함께 포함할 수 있다. 또한, 멀티미디어 컨텐츠를 식별하기 위한 소정의 식별자 정보를 포함할 수 있다. 이러한 식별자 정보는 디지털 파일의 저작권을 관리하기 위한 것으로, 이 식별자에 의해 멀티미디어 컨텐츠를 구분할 수 있는 경우에 사용할 수 있다. 파일에 대한 정보를 포함하는 체크파일을 수신한 파일 서버(1000)는 상기 체크파일과 동일한 체크파일을 가지는 원본 파일이 존재하는지 검색한다(S103). 도 5의 파일 정보부(950)를 통해 검색할 수 있다.The file server 1000 receives a file upload request from the client 500 (S101). The client 500 receives a file to be uploaded or a check file for a part of the file (S102). The check file may be a file generated by the above-described CRC, checksum method, and may include information on the size of the file to be uploaded. In addition, the file name or content type may be included together. In addition, it may include predetermined identifier information for identifying the multimedia content. Such identifier information is used to manage the copyright of the digital file and can be used when the multimedia content can be distinguished by this identifier. The file server 1000 receiving the check file including the information about the file searches whether the original file having the same check file as the check file exists (S103). The file information unit 950 of FIG. 5 may be used for searching.

그 결과 원본 파일이 존재한다면(S111) 상이한 부분만을 수신하여 저장하는 작업을 수행한다(S113~S142). 업로드될 파일에 대한 정보를 파일 정보부(950)에 저장한다(S113). 사용자가 업로드중에 네트워크 또는 컴퓨터 환경등에 따라 중지될 수 있기 때문에, 이어서 올릴 수 있도록 하기 위함이다. 업로드될 파일에 대한 정보 중에서 도 5의 원본 필드에 특정 파일 식별자를 저장한 것과 같이 원본 파일에 대한 정보를 저장할 수 있다. 더불어 파일 정보부(950)에 저장된 원본 파일의 정보를 수정한다(S114). 이는 업로드 중에 원본 파일이 삭제될 경우, 원본 파일을 다른 파일에서 참조하고 있음을 알리기 위함이다. 그리고 클라이언트에 업로드할 파일 또는 그 파일의 일부에 대한 체크파일을 요청한다(S115). 이는 파일 서버(1000)에서 요청할 수도 있고, 클라이언트에서 원본 파일과의 비교를 위해 순차적으로 체크파일을 전송할 수 있다. 파일 서버(1000)는 수신한 체크파일과 원본 파일의 체크파일을 비교한다(S116). 비교결과 체크파일이 동일하면(S131), 그 체크 파일이 마지막 세그먼트 또는 파일의 마지막 부분에 대한 체크파일인지 검토한다(S141). 마지막이라면, 파일 정보부(950)에 업로드가 완료됨을 기록하고 종료한다(S142).As a result, if the original file exists (S111), only a different portion is received and stored (S113 to S142). Information about the file to be uploaded is stored in the file information unit 950 (S113). This is to allow the user to continue uploading because the user may stop the upload depending on the network or computer environment. Of the information on the file to be uploaded, information about the original file may be stored, such as storing a specific file identifier in the original field of FIG. 5. In addition, the information of the original file stored in the file information unit 950 is corrected (S114). This is to indicate that if the original file is deleted during upload, the original file is referenced by another file. In operation S115, a file to be uploaded to the client or a check file for a part of the file is requested. This may be requested from the file server 1000, or the check file may be sequentially transmitted from the client for comparison with the original file. The file server 1000 compares the received check file with the check file of the original file (S116). If the comparison result is the same check file (S131), it is checked whether the check file is the check file for the last segment or the last part of the file (S141). If it is the last, the file information unit 950 records that the upload is completed and ends (S142).

파일의 마지막이 아니라면, 다음 세그먼트 또는 파일의 다음 부분에 대한 체크파일을 요청하고, 전술한 S115 내지 S141 과정을 수행한다.If it is not the end of the file, a check file for the next segment or the next part of the file is requested, and the processes S115 to S141 described above are performed.

한편, S131 단계에서 체크파일이 다르다면, 두 세그먼트가 다르므로, 체크파일을 저장한다(S133). 상기 체크파일이 나타내는 파일 또는 파일의 일부를 송신할 것을 클라이언트에 요청한다(S134). 그리고 수신한 파일 또는 파일의 일부를 저장한다(S135). 그리고 전술한 S141 단계로 진행한다. 마지막이 아니면, 다음 세그먼트의 체크파일을 수신하여 전술한 단계를 진행한다.On the other hand, if the check file is different in step S131, since the two segments are different, the check file is stored (S133). The client requests the client to transmit the file or part of the file indicated by the check file (S134). The received file or part of the file is stored (S135). The flow proceeds to step S141 described above. If not, the check file of the next segment is received and the above-described steps are performed.

한편, S111단계에서 원본 파일이 존재하지 않는 것으로 판단되면, 파일 정보부(950)에 새로이 업로드될 파일의 정보를 저장한다(S121). 이때는 업로드될 파일 이 원본임을 표시한다. 그리고 클라이언트에게 업로드 파일을 송신할 것을 요청한다(S122). 업로드 파일을 수신하면(S123), 파일 서버는 향후 이 파일을 원본 파일로 사용할 수 있도록 체크파일을 생성한다(S124). 체크파일은 파일을 모두 수신하여 일괄적으로 생성할 수 있으며, 파일을 수신하는 중간에 계속 세그먼트 단위로 저장하여, 세그먼트에 대한 체크파일을 생성할 수 있다.On the other hand, if it is determined in step S111 that the original file does not exist, the file information unit 950 stores the information of the newly uploaded file (S121). In this case, it indicates that the file to be uploaded is the original. The client requests to send the upload file (S122). When receiving the upload file (S123), the file server generates a check file to use this file as an original file in the future (S124). The check file may be generated by receiving all the files in a batch. The check file may be continuously stored in segment units in the middle of receiving the file to generate a check file for the segment.

도 6의 순서도는 원본 파일이 존재하는 경우, 원본 파일의 체크파일과 비교하여 클라이언트로부터 상이한 파일 부분을 수신하는 방식을 채택하고 있다.The flowchart of FIG. 6 adopts a method of receiving a different file portion from a client when the original file exists, compared to the check file of the original file.

이외에도, 업로드할 파일에 대한 체크파일이 다수인 경우, 클라이언트가 체크파일을 모두 송신하고, 파일 서버에서 상이한 부분만을 클라이언트에 알려서, 해당 세그먼트만을 수신하는 방식도 가능하다. In addition, when there are a large number of check files for a file to be uploaded, the client may transmit all the check files, notify the client of only the different parts of the file server, and receive only the corresponding segments.

클라이언트(500)는 파일 정보를 송신한다(S201). 파일 정보에는 동일성을 검토하기 위한 체크파일, 파일 사이즈, 파일명 등이 포함된다. 또는 상기 정보들을 포함한 체크파일을 송신할 수 있다. 이를 수신한 송수신부(910)는 이 정보를 가지는 원본 파일이 존재하는지 파일 정보부(950)를 통해 검토한다(S205). 검토 결과 원본 파일이 존재하면 원본 파일이 존재함을 알린다(S208). 그리고 송수신부(910)는 원본 파일이 존재하므로 체크파일등을 통해 상이한 부분만을 업로드하게 됨을 클라이언트(500)에 알린다(S210).The client 500 transmits file information (S201). The file information includes a check file, file size, file name, and the like for checking the identity. Alternatively, the check file including the information may be transmitted. The transmission / reception unit 910 receiving the information checks through the file information unit 950 whether an original file having this information exists (S205). As a result of the review, if the original file exists, it informs that the original file exists (S208). In addition, the transmitter / receiver 910 notifies the client 500 that only a different part is uploaded through the check file because the original file exists (S210).

클라이언트(500)는 중복되지 않는 파일 또는 파일의 일부를 서버에 전송하기 위해 파일을 구성하는 세그먼트의 체크파일을 전송한다. 클라이언트(500)의 업로드 대상이 되는 파일이 다수의 작은 파일들로 구성된 경우에는 각 파일이 세그먼트가 될 수 있다. 첫번째 세그먼트의 체크파일을 송신한다(S213). 이 체크파일을 수신한 송수신부(910)는 체크파일을 파일저장부(960)에 송신한다(S215). 그리고 수신한 체크파일을 원본 파일의 체크파일과 비교한다(S217). 비교 결과 체크파일이 동일하다면, 해당 체크파일이 나타내는 파일 또는 파일의 일부인 세그먼트를 수신할 필요가 없다. 따라서 이러한 비교 결과를 송수신부에 송신한다(S220). 그리고 송수신부(910)는 다음 세그먼트인 두번째 세그먼트의 체크파일을 송신할 것을 요청한다(S222). 클라이언트(S500)는 두번째 세그먼트의 체크파일을 송신한다(S225). 역시 송수신부(910)는 체크파일을 파일저장부(960)로 송신한다(S227). 파일저장부(960)에서는 원본 파일의 두번째 세그먼트의 체크파일과 비교한다(S229). 그 결과 두 체크파일이 동일하지 않을 경우, 그 결과를 송수신부에 송신한다(S232). 송수신부는 두번째 체크파일이 동일하지 않으므로, 두번째 세그먼트를 전송할 것을 요청한다(S234). 클라이언트는 두번째 세그먼트를 송신한다(S237). 송수신부(910)는 이 세그먼트를 파일저장부(960)에 송신한다(S239). 파일저장부(960)는 세그먼트를 저장하고, 이 세그먼트에 대한 체크파일을 저장한다(S241). The client 500 transmits a check file of a segment constituting the file to transmit a file or a part of the file that is not duplicated to the server. When the file to be uploaded by the client 500 is composed of a plurality of small files, each file may be a segment. The check file of the first segment is transmitted (S213). The transceiving unit 910 receiving the check file transmits the check file to the file storage unit 960 (S215). The received check file is compared with the check file of the original file (S217). If the comparison results in the same checkfile, there is no need to receive the file represented by the checkfile or a segment that is part of the file. Therefore, the comparison result is transmitted to the transceiver unit (S220). In addition, the transceiver 910 requests to transmit the check file of the second segment, which is the next segment (S222). The client S500 transmits a check file of the second segment (S225). Again, the transceiver 910 transmits the check file to the file storage unit 960 (S227). The file storage unit 960 compares the check file of the second segment of the original file (S229). As a result, if the two check files are not the same, the result is transmitted to the transceiver (S232). The transmitter / receiver requests the second segment to transmit the second segment because the second check file is not the same (S234). The client transmits a second segment (S237). The transceiver 910 transmits this segment to the file storage 960 (S239). The file storage unit 960 stores the segment and stores the check file for the segment (S241).

이후, 클라이언트와 파일서버는 세그먼트별로 중복여부를 검토하여 상이한 세그먼트를 저장한다. 클라이언트가 마지막 세그먼트의 체크파일을 송신하면(S261) 송수신부(910)는 체크파일을 파일저장부(960)으로 송신하여 원본과 비교한다. 그 결과 동일하면 송수신부(910)에 비교 결과를 송신하고(S268), 송수신부(910)는 클 라이언트(500)에 업로드 종료됨을 알린다(S270). 물론, 마지막 세그먼트의 체크파일이 원본과 다르다면, S234 내지 S241에 제시된 과정을 수행하게 된다. 그리고 파일 정보부(950)는 업로드 파일의 정보를 수정 하거나 저장하고, 원본 파일의 정보를 수정한다(S272).After that, the client and the file server store the different segments by examining whether they overlap each other. When the client transmits the check file of the last segment (S261), the transceiver 910 transmits the check file to the file storage 960 and compares it with the original. As a result, if the comparison result is transmitted to the transceiver 910 (S268), the transceiver 910 notifies the client 500 that the upload is finished (S270). Of course, if the check file of the last segment is different from the original, the process shown in S234 to S241 is performed. The file information unit 950 modifies or stores the information of the upload file and modifies the information of the original file (S272).

본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구의 범위에 의하여 나타내어지며, 특허청구의 범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Those skilled in the art will appreciate that the present invention can be embodied in other specific forms without changing the technical spirit or essential features of the present invention. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. The scope of the present invention is indicated by the scope of the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and the equivalent concept are included in the scope of the present invention. Should be interpreted.

본 발명을 구현함으로써 파일의 중복되지 않는 부분만을 저장하여 저장 공간의 효율을 높일 수 있다.By implementing the present invention it is possible to increase the efficiency of the storage space by storing only the non-overlapping portions of the file.

본 발명을 구현함으로써 중복된 파일의 업로드를 제한하여 네트워크의 부하를 줄일 수 있다.By implementing the present invention it is possible to reduce the load on the network by limiting the upload of duplicate files.

Claims

Receiving check data for at least a portion of the first file from the client;

Searching for a second file having the same check data as the received check data;

If the second file exists, comparing the first file with the second file from the client and extracting a same part and a different part; And

And the same portion of the first file refers to the second file, and only storing different portions of the first file.

The method of claim 1,

The extracting step

And comparing the check data for the first area of the first file with the check data for the second area of the second file.

And a position occupied by the first region in the first file and a position occupied by the second region in the second file are the same.

The method of claim 1,

The extracting step

The first region and the second region have the same bitstring in the first file and the second file and have the same offset as the common third region, and the duplicated file is stored using information of the file. .

The method of claim 1,

The storing step

And storing check data for a portion of the first file.

The method of claim 1,

And the check data includes data obtained through a CRC or a checksum scheme for at least a portion of the first file.

The method of claim 1,

The check data includes information on the size of the first file, the method of storing a duplicate file using the information of the file.

Receiving check data for at least some files of the first filegroup from the client;

Searching for a second file group having the same check data as the received check data;

Comparing the first file of the first filegroup with the second file of the second filegroup from the client when the second filegroup exists; And

Storing the duplicate file using information of the file, the first file being identical to the second file referring to the second file, and storing the first file that is not equal to the second file. How to.

The method of claim 7, wherein

The comparing step

And comparing the check data of the first file with the check data of the second file.

The method of claim 7, wherein

The storing step

And storing the check data for the first file when the first file is stored.

The method of claim 7, wherein

And the check data includes data obtained through a CRC or a checksum method for the first file group.

The method of claim 7, wherein

And the check data includes information on the size of the first filegroup.

Receiving information about a first file to download from the client;

Searching whether a second file having the same portion as the first file exists;

If the second file exists, the same portion of the first file and the second file is extracted from the second file and transmitted to the client, and different portions of the first file and the second file are stored in the first file. And extracting from the file and transmitting the extracted file to the client.

The method of claim 12,

And determining the same or different parts by referring to the check data for the first file.

The method of claim 13,

The check data includes information about the size of the first file, the method of downloading a duplicate file using the information of the file.

The method of claim 12,

The first file or the second file comprises a file group comprising two or more files, the method of downloading a duplicate file using the information of the file.

A transceiver for receiving check data on at least a portion of the first file from a client;

A file information unit searching for a second file having the same check data as the received check data and storing the received check data; And

If the second file exists, compare the first file and the second file from the client to extract the same and different portions and store the same portion to refer to the second file, wherein the different portion is A file server comprising a file storage unit for storing using a portion of the first file.

The method of claim 17,

The transceiver receives information about a first file to download from the client,

The file information unit searches whether the second file having the same portion as the first file exists by using the received information.

When the second file is present, the transceiver unit extracts the same portion of the first file and the second file from the second file stored in the file storage unit and transmits the same to the client. And a different portion of the second file is extracted from the first file stored in the file storage and sent to the client.

The method of claim 17,

The file information unit compares and extracts check data for the first area of the first file and check data for the second area of the second file.

The method of claim 17,

And the first region and the second region have a common bitstring for the first file and the second file and have the same offset as a third region that exists.

The method of claim 17,

The file storage unit stores the check data for a portion of the first file.

The method of claim 17,

And the check data includes data obtained via CRC or checksum scheme for at least a portion of the first file.

The method of claim 17,

And the check data includes information about the size of the first file.

The method of claim 17 or 18,

And the first file or the second file consists of a filegroup comprising one or more files.