JP7529673B2

JP7529673B2 - Content-agnostic file indexing method and system

Info

Publication number: JP7529673B2
Application number: JP2021540318A
Authority: JP
Inventors: クリストファーマケルヴィーン，
Original assignee: Mcelveen Christopher; Lognovations Holdings LLC
Current assignee: Mcelveen Christopher; Lognovations Holdings LLC
Priority date: 2019-01-10
Filing date: 2020-01-08
Publication date: 2024-08-06
Anticipated expiration: 2040-01-08
Also published as: WO2020146448A1; JP2022518194A; CA3126012A1; AU2020205970B2; EP3908937A4; KR20210110875A; EP3908937A1; AU2020205970A1

Description

（関連出願の相互参照）
本出願は、２０１７年１０月１１日に出願された、「コンテンツ不可知ファイルインデキシングの方法及びシステム（ＭｅｔｈｏｄａｎｄＳｙｓｔｅｍｆｏｒＣｏｎｔｅｎｔＡｇｎｏｓｔｉｃＦｉｌｅＩｎｄｅｘｉｎｇ）」という名称の特許出願第１５／７３０，０４３号の一部継続出願であり、同出願の内容はあらゆる目的の為に本明細書に参照により完全に援用される。 CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of patent application Ser. No. 15/730,043, filed Oct. 11, 2017, and entitled "Method and System for Content Agnostic File Indexing," the contents of which are hereby incorporated by reference in their entirety for all purposes.

（コンピュータプログラムリスト－シーケンスリスト）
以下のコンピュータプログラムリストを本明細書に添付して提出し、参照により援用する。夫々のファイルは参照により援用される。以下のコンピュータプログラムリストは、以下の形式である。＜バイト単位のサイズ＞＜作成日＞＜ファイル名＞。 (Computer Program Listing - Sequence Listing)
The following computer program listing is submitted herewith and incorporated by reference. Each file is incorporated by reference. The computer program listing below is in the following format: <size in bytes><creationdate><filename>.

3864 May 16 2018 squeeze-master-README-md.txt* 3864 May 16 2018 squeeze-master-README-md.txt*

83675 May 16 2018 squeeze-master-SqueezeReport-ipynb.txt* 83675 May 16 2018 squeeze-master-SqueezeReport-ipynb.txt*

4293 May 16 2018 squeeze-master-demo_app-py.txt* 4293 May 16 2018 squeeze-master-demo_app-py.txt*

98 May 16 2018 squeeze-master-gitignore.txt* 98 May 16 2018 squeeze-master-gitignore.txt*

1383 May 16 2018 squeeze-master-requirements.txt* 1383 May 16 2018 squeeze-master-requirements.txt*

2490 May 16 2018 squeeze-master-rpc_server-py.txt* 2490 May 16 2018 squeeze-master-rpc_server-py.txt*

239 May 16 2018 squeeze-master-scripts-buildprotos.txt* 239 May 16 2018 squeeze-master-scripts-buildprotos.txt*

942 May 16 2018 squeeze-master-scripts-file_test-py.txt* 942 May 16 2018 squeeze-master-scripts-file_test-py.txt*

1391 May 16 2018 squeeze-master-scripts-generate_key-py.txt* 1391 May 16 2018 squeeze-master-scripts-generate_key-py.txt*

711 May 16 2018 squeeze-master-scripts-generate_keyset-py.txt* 711 May 16 2018 squeeze-master-scripts-generate_keyset-py.txt*

629 May 16 2018 squeeze-master-scripts-keys_from_folder-py.txt* 629 May 16 2018 squeeze-master-scripts-keys_from_folder-py.txt*

377 May 16 2018 squeeze-master-scripts-lzw_test-py.txt* 377 May 16 2018 squeeze-master-scripts-lzw_test-py.txt*

107 May 16 2018 squeeze-master-scripts-runserver.txt* 107 May 16 2018 squeeze-master-scripts-runserver.txt*

3928 May 16 2018 squeeze-master-scripts-squeeze-bytes-report-py.txt* 3928 May 16 2018 squeeze-master-scripts-squeeze-bytes-report-py.txt*

63 May 16 2018 squeeze-master-scripts-squeeze_file-py.txt* 63 May 16 2018 squeeze-master-scripts-squeeze_file-py.txt*

1060 May 16 2018 squeeze-master-scripts-squeeze_test-py.txt* 1060 May 16 2018 squeeze-master-scripts-squeeze_test-py.txt*

947 May 16 2018 squeeze-master-scripts-string_test-py.txt* 947 May 16 2018 squeeze-master-scripts-string_test-py.txt*

222 May 16 2018 squeeze-master-scripts-test_binary-py.txt* 222 May 16 2018 squeeze-master-scripts-test_binary-py.txt*

1799 May 16 2018 squeeze-master-scripts-test_rpc-py.txt* 1799 May 16 2018 squeeze-master-scripts-test_rpc-py.txt*

2736 May 16 2018 squeeze-master-scripts-time-squeeze-string-py.txt* 2736 May 16 2018 squeeze-master-scripts-time-squeeze-string-py.txt*

211 May 16 2018 squeeze-master-scripts-time_keygen-py.txt* 211 May 16 2018 squeeze-master-scripts-time_keygen-py.txt*

65 May 16 2018 squeeze-master-scripts-unsqueeze_file-py.txt* 65 May 16 2018 squeeze-master-scripts-unsqueeze_file-py.txt*

80 May 16 2018 squeeze-master-setup-py.txt* 80 May 16 2018 squeeze-master-setup-py.txt*

10657 May 16 2018 squeeze-master-squeeze- _ init _ -py.txt* 10657 May 16 2018 squeeze-master-squeeze- _ init _ -py.txt*

2783 May 16 2018 squeeze-master-squeeze-bitstring-py.txt* 2783 May 16 2018 squeeze-master-squeeze-bitstring-py.txt*

9191 May 16 2018 squeeze-master-squeeze-keys-py.txt* 9191 May 16 2018 squeeze-master-squeeze-keys-py.txt*

613 May 16 2018 squeeze-master-squeeze-performance-csv.txt* 613 May 16 2018 squeeze-master-squeeze-performance-csv.txt*

22445 May 16 2018 squeeze-master-squeeze-squeeze_pb2-py.txt* 22445 May 16 2018 squeeze-master-squeeze-squeeze_pb2-py.txt*

2232 May 16 2018 squeeze-master-squeeze-squeeze_pb2_grpc-py.txt* 2232 May 16 2018 squeeze-master-squeeze-squeeze_pb2_grpc-py.txt*

3366 May 16 2018 squeeze-master-squeeze-proto.txt* 3366 May 16 2018 squeeze-master-squeeze-proto.txt*

875 May 16 2018 squeeze-master-templates-layout-html.txt* 875 May 16 2018 squeeze-master-templates-layout-html.txt*

816 May 16 2018 squeeze-master-templates-upload_form-html.txt* 816 May 16 2018 squeeze-master-templates-upload_form-html.txt*

1513 May 16 2018 squeeze-master-templates-uploaded_file-html.txt* 1513 May 16 2018 squeeze-master-templates-uploaded_file-html.txt*

200 May 16 2018 squeezerpc-master-Makefile.txt* 200 May 16 2018 squeezerpc-master-Makefile.txt*

1131 May 16 2018 squeezerpc-master-README-md.txt* 1131 May 16 2018 squeezerpc-master-README-md.txt*

7 May 16 2018 squeezerpc-master-gitignore.txt* 7 May 16 2018 squeezerpc-master-gitignore.txt*

8995 May 16 2018 squeezerpc-master-main-go.txt* 8995 May 16 2018 squeezerpc-master-main-go.txt*

21292 May 16 2018 squeezerpc-master-squeeze-squeeze-pb-go.txt* 21292 May 16 2018 squeezerpc-master-squeeze-squeeze-pb-go.txt*

3366 May 16 2018 squeezerpc-master-squeeze-proto.txt* 3366 May 16 2018 squeezerpc-master-squeeze-proto.txt*

本開示は、コンテンツ不可知ファイル参照の為の方法に関するものである。本方法は、更に、コンテンツ不可知データ圧縮の為の方法に関するものである。 The present disclosure relates to a method for content agnostic file referencing. The method further relates to a method for content agnostic data compression.

ファイル参照技術は、一般に、ファイル参照システムにおいてデータを効率的にインデキシングする為に、保存されているデータの種類に関する知識を必要とする。同様に、問題となっているデータに関する知識は、一般的に、送信、保存等の為にデータサイズを縮小する為の改良された圧縮アプローチを作成する際にも使用される。 File referencing techniques typically require knowledge of the type of data being stored in order to efficiently index the data in the file referencing system. Similarly, knowledge of the data in question is typically used in creating improved compression approaches to reduce the size of data for transmission, storage, etc.

業界では、保存及び／又は送信しなければならないデータ量を減らす為に、ファイル参照及びデータ圧縮技術を改善する必要性が存在する。 There is a need in the industry for improved file referencing and data compression techniques to reduce the amount of data that must be stored and/or transmitted.

一実施形態によれば、本開示は、強化されたコンテンツ不可知ファイル参照システムを用いてコンピューティング技術を改善する方法を提供する。この方法は、コンピュータ自体の動作を改善するものである。 According to one embodiment, the present disclosure provides a method for improving computing technology using an enhanced content-agnostic file referencing system, which improves the operation of the computer itself.

開示された方法は幾つかの重要な利点を有する。例えば、開示された方法は、任意のコンテンツタイプのファイル参照を可能にする。 The disclosed method has several important advantages. For example, the disclosed method allows file references of any content type.

開示された方法は、更に、データが永続化されるのとは対照的に、アクセス時に生成され得るので、永続化又は送信されなければならない情報又はデータの量を大幅に減少させることができる。 The disclosed method may further significantly reduce the amount of information or data that must be persisted or transmitted because the data may be generated at the time of access, as opposed to being persisted.

本開示の様々な実施形態は、これらの利点の何れも有していなくても、又はその一部或いは全てを有していてもよい。当業者には、本開示の他の技術的利点も容易に明らかになり得る。 Various embodiments of the present disclosure may have none, some, or all of these advantages. Other technical advantages of the present disclosure may be readily apparent to those skilled in the art.

本開示及びその利点をより完全に理解する為に、ここで、添付の図面と併せて以下の説明を参照する。
同じ参照符号は、図面の幾つかの図を通して同じ部品又はステップを参照する。
本開示の一実施形態のステップを概説するフローチャートである。本開示の別の実施形態のステップを概説する別のフローチャートである。本開示の代替実施形態のステップを概説するフローチャートである。 For a more complete understanding of the present disclosure and its advantages, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
The same reference numbers refer to the same parts or steps throughout the several views of the drawings.
1 is a flow chart outlining the steps of one embodiment of the present disclosure. 4 is another flowchart outlining the steps of another embodiment of the present disclosure. 1 is a flow chart outlining the steps of an alternative embodiment of the present disclosure.

本開示は、データのコンテンツ不可知インデキシングの為の方法に関するものである。本方法は、例えば、ファイル参照システムや圧縮システムとして等、コンピュータ特有の様々なニーズに使用され得る。 This disclosure relates to a method for content-agnostic indexing of data. The method can be used for a variety of computer-specific needs, such as, for example, as a file referencing system or a compression system.

以下の開示では、例示としてバイナリデータの圧縮に関連して本発明を説明しているが、本教示は、「ｎ－ａｒｙ」データと呼ぶ方が相応しい、あらゆるタイプのデータで同様に機能する。例えば、本方法及びシステムは、量子ビット及びビットでも動作する。 In the following disclosure, the invention is described with respect to compressing binary data as an example, but the teachings work equally well with any type of data that is better referred to as "n-ary" data. For example, the methods and systems also work with qubits and bits.

本発明の一実施形態は、図１に描かれたフローチャートに記載されているような方法を含む。永続化又は送信されるバイナリデータ（ｎ_ｉ）（例えば、データファイル）が分析され、ビット単位の長さ（ｌ（ｎ_ｉ））が決定される。この情報を用いて、ステップ１０６で、本方法は、識別された長さのデータの全ての順列を計算する。例えば、入力データが次のような場合、
０１ One embodiment of the present invention includes a method as described in the flow chart depicted in Figure 1. Binary data (n _i ) to be persisted or transmitted (e.g., a data file) is analyzed to determine its length in bits (l(n _i )). Using this information, in step 106, the method calculates all permutations of data of the identified length. For example, if the input data is:
01

入力データは２ビットの長さとなる。ステップ１０６で、２ビットの全ての順列が生成され、即ち次のようになる。
｛００｝｛０１｝｛１０｝｛１１｝ The input data is 2 bits long. In step 106, all permutations of the 2 bits are generated, namely:
{00}{01}{10}{11}

ステップ１０８で、本方法は、生成された順列における入力バイナリデータファイルのインデックス（ｎ_ｆ）を決定する。上記の例では、返されるインデックス（ｎ_ｆ）は「１」となる。最後に、入力されたバイナリデータ（即ち「０１」）を保存又は送信するのではなく、システムは代わりに長さ（２）とインデックス（１）を保存する。 At step 108, the method determines the index ( _nf ) of the input binary data file in the generated permutation. In the above example, the returned index ( _nf ) would be "1." Finally, rather than storing or transmitting the input binary data (i.e., "01"), the system instead stores the length (2) and index (1).

元の入力データをデコードする必要が生じた場合（例えば、元のバイナリデータをディスクから取得する要求や、ネットワークを介して送信されたデータの受信）、この方法は、入力として長さ（ｌ（ｎ_ｉ））とインデックス（ｎ_ｆ）のみを必要とする。上記の例では、長さ（２）とインデックス（１）が入力となる。図２に示すように、システムは、入力された長さの全ての順列を計算する。上記の例では、次のような順列が生成される。
｛００｝｛０１｝｛１０｝｛１１｝ When the original input data needs to be decoded (e.g., a request to retrieve the original binary data from a disk or receiving data sent over a network), the method only requires the length (l(n _i )) and index (n _f ) as input. In the above example, the inputs are length (2) and index (1). As shown in Figure 2, the system computes all permutations of the input length. In the above example, the following permutations are generated:
{00}{01}{10}{11}

システムは指定されたインデックス（上記の例では１）に移動し、順列を返す。再び上記の例を用いて、元のバイナリデータの「０１」が返される。 The system moves to the specified index (1 in the above example) and returns the permutation. Again, using the above example, the original binary data "01" is returned.

上記の方法は、例示の目的で、バイナリシステム（即ち、入力データがバイナリデータである）の観点から説明されている。本方法及びシステムは、ｎ－ａｒｙシステムについても同様に動作する。上記のバイナリシステムは基本的にユークリッド平面で動作するが、ｎ－ａｒｙデータではヒルベルト空間が概念的に同じ利点を提供する。この方法とプロセスは、以下のようにｎ－ａｒｙデータに対して一般化され得る。
ｄ＾ｎ＝ｐ（ｉ）
（ｄ＾ｎ）ｎ＝ｐ（ｆ）
ｄ＝システムの次数
ｎ＝システムの次数に応じた適切なｎ－ａｒｙ単位での長さ
ｐ（ｉ）＝初期インデックス
ｐ（ｆ）＝最終インデックス

The above method is described in terms of a binary system (i.e., the input data is binary data) for illustrative purposes. The method and system work similarly for n-ary systems. Although the above binary system essentially operates in the Euclidean plane, for n-ary data, Hilbert space offers the same conceptual advantages. The method and process can be generalized to n-ary data as follows.
d^n=p(i)
(d^n)n=p(f)
d = system order n = length in n-ary units appropriate to the system order p(i) = initial index p(f) = final index

同じ入力ファイルで２つの代替的な順序付けされたシステムが与えられた場合、より高い順序を持つシステムが、より低い順序を持つ代替的なシステムと比較して、より高いｎ－ａｒｙ密度を有することに留意すべきである。 It should be noted that given two alternative ordered systems with the same input file, the system with the higher order will have a higher n-ary density compared to the alternative system with the lower order.

本方法の一例は、以下のＲｕｂｙコードスニペットに開示されている。以下のスニペットは、図１に開示されているような方法を示している。

An example of this method is disclosed in the following Ruby code snippet: The following snippet illustrates the method as disclosed in FIG.

以下のスニペットは、入力長（ｌ（ｎ_ｉ））が１６、インデックス（ｎ_ｆ）が７２，６２９の場合の、図２に開示されている方法を示している。

The following snippet illustrates the method disclosed in FIG. 2 for an input length (l(n _i )) of 16 and an index (n _f ) of 72,629.

好ましい実施形態では、入力バイトストリングは、入力バイトストリングの表現に対応するビットストリングに変換される。このビットストリングは、次に本明細書に記載された方法によって処理されるものである。 In a preferred embodiment, an input byte string is converted into a bit string that corresponds to a representation of the input byte string. This bit string is then processed by the methods described herein.

代替的な実施形態では、データの長さに基づいて表を生成するのではなく、特定の長さのデータの全ての順列で表を事前生成してもよい。この事前生成された表は、不揮発性メモリ又は揮発性メモリの何れかのメモリに永続化されてもよい。上記の例では、所定の長さが２ビットの場合、事前生成された表には、次のような２ビットデータの全ての順列が含まれる。
｛００｝｛０１｝｛１０｝｛１１｝ In an alternative embodiment, rather than generating a table based on the length of the data, a table may be pre-generated with all permutations of data of a particular length. This pre-generated table may be persisted in memory, either non-volatile or volatile. In the above example, if the given length is 2 bits, the pre-generated table would include all permutations of 2-bit data such as:
{00}{01}{10}{11}

一実施形態では、この表は、以下のように対応するインデックスを付けた配列で格納されてもよい。

In one embodiment, this table may be stored in a correspondingly indexed array as follows:

この事前生成された表は、ディスクやＲＡＭ等に保存されてもよい。好ましくは、この事前生成された表は、ファイルサイズを縮小する（又はファイルをスクイーズする）計算機システムと、縮小されたファイルを拡張する（又はデータをスクイーズ解除する）計算機システムとで保存される。 This pre-generated table may be stored on disk, in RAM, etc. Preferably, this pre-generated table is stored on the computer system that reduces the file size (or squeezes the file) and on the computer system that expands the reduced file (or unsqueezes the data).

入力データを受け取ると、本方法はデータをより小さなサブセットに「チャンク」する。本明細書では、「チャンク」とは、データストリングを取得して大きなデータストリングのサブセットを構成する小さなデータストリングを作成することを意味する。全てのチャンクを合わせると元のデータストリングになる。例えば、入力データが
０１１００１１１０００１
であれば、 Upon receiving input data, the method "chunks" the data into smaller subsets. As used herein, "chunking" means taking a data string and creating smaller data strings that constitute subsets of the larger data string. All the chunks taken together make up the original data string. For example, if the input data is 011001110001
If,

以下の４ビットチャンクにチャンクされることになる。
００１００１１１０００１ It will be chunked into the following 4-bit chunks:
0010 0111 0001

次に、個々のチャンクは、事前生成された表と比較され、一致するものがあるかどうかを確認する。上記の例では、チャンクのサイズが４ビットの場合、表には全２ビットチャンクの順列を有する為、各チャンクは表に見つからないことになる。従って、各チャンクは再度チャンクされ、以下のようになる。
０１１００１１１０００１ Then each chunk is compared to the pre-generated table to see if there is a match. In the above example, if the chunk size is 4 bits, each chunk will not be found in the table since the table has all the permutations of the 2-bit chunks. Therefore, each chunk is re-chunked, as follows:
01 10 01 11 00 01

この方法は、特定のチャンクが事前生成された表の中に配置される時点まで、各チャンクに対して続けられる。その時点で、チャンクは夫々のインデックスと関連付けられ、好ましくは、チャンクレベル及び対応するインデックスを示す一連のタプルが生成される。上記の例では、システムが２回チャンクしたので、インデックスの関連付けは以下のようになる。
｛２，１｝｛２，２｝｛２，１｝｛２，３｝｛２，０｝｛２，１｝ This method continues for each chunk until the particular chunk is placed in the pre-generated table. At that point, the chunk is associated with a respective index and preferably a series of tuples are generated indicating the chunk level and the corresponding index. In the above example, the system chunked twice, so the index associations are as follows:
{2,1}{2,2}{2,1}{2,3}{2,0}{2,1}

この例では、元の入力データ「０１１００１１１０００１」は、最終的に２ビット長の６つのチャンクに分割された。図示のように、各チャンクはチャンクレベル（２）と、事前生成された表への対応するインデックスで表されている。 In this example, the original input data "011001110001" was ultimately split into 6 chunks of 2-bit length. As shown, each chunk is represented by its chunk level (2) and a corresponding index into the pre-generated table.

データは、任意の数の方法でチャンクされ得る。例えば、データは、上記の例のように、事前に決定されたサイズに基づいてチャンクされてもよい（ここで、事前に決定されたサイズは、例の目的の為に４ビットであった）。或いは、各データチャンクが事前生成された表の中で見つかるまで入力データを２つの別々のデータチャンクに再帰的にチャンクしてもよい。上記と同じ入力データを使用して、データを分割してチャンク化する方法では、次のような第１レベルのチャンクになる。
０１１００１１１０００１ The data may be chunked in any number of ways. For example, the data may be chunked based on a pre-determined size, as in the example above (where the pre-determined size was 4 bits for purposes of the example). Alternatively, the input data may be recursively chunked into two separate data chunks until each data chunk is found in the pre-generated table. Using the same input data as above, the method of splitting and chunking the data would result in the following first level chunks:
011001 110001

ここで、データセットが事前生成された表に見つからないので、再度チャンクされる。
０１１００１１１０００１ Here, the dataset is not found in the pre-generated table and is therefore re-chunked.
011 001 110 001

ここでも、チャンクされたデータは事前生成された表では見つからないので、再度チャンクされなければならない。
００１００１１１０００１ Again, the chunked data cannot be found in the pre-generated table and must be re-chunked.
00 1 00 1 11 0 00 1

注目すべきは、幾つかのセグメントが、事前生成された表のサイズよりも小さいデータにチャンクされていることである（即ち、セグメント「１」、「１」、「０」、「１」）。これらのセグメントは、事前生成された表と比較する為にパディングされることがある。一貫性が保たれていれば、数字はビッグエンディアン又はリトルエンディアンバイトオーダーで保存され得る。例えば、ビッグエンディアンバイトオーダーを使った場合、上記のチャンクデータは次のように表される。
００１０００１０１１００００１０ It is worth noting that some segments have been chunked into data smaller than the size of the pre-generated table (i.e. segments "1", "1", "0", "1"). These segments may be padded to make them comparable to the pre-generated table. As long as consistency is maintained, numbers may be stored in either big-endian or little-endian byte order. For example, using big-endian byte order, the above chunked data would be represented as follows:
00 10 00 10 11 00 00 10

その後、この方法は上記と同じように続く。 The method then continues as above.

データの全てのチャンクが、同じデータチャンクレベルで事前生成された表の中で見つかることは必要ではない。例えば、２ビットの組み合わせに関する上記の事前生成された表を使用して、もし入力データが以下であれば、
０１１００１１１００ It is not necessary that all chunks of data be found in the pre-generated tables at the same data chunk level. For example, using the above pre-generated tables for 2-bit combinations, if the input data is:
0110011100

このデータは本来、上記のように４ビットのシーケンスに分割してチャンクされ得る。
０１１００１１１００ This data can be essentially chunked into sequences of 4 bits as described above.
0110 0111 00

上記と同様に、最初の２つの４ビットシーケンス（即ち「０１１０」と「０１１１」）は、事前生成された表に配置される為に、より小さなチャンクに再度チャンクされなければならず、その結果、以下のチャンクになる。
０１１００１１１００ Similar to above, the first two 4-bit sequences (i.e., “0110” and “0111”) must be re-chunked into smaller chunks in order to be placed into the pre-generated table, resulting in the following chunks:
01 10 01 11 00

また、上記のように、チャンクは以下のように自らのチャンクレベル及び対応するインデックスに関連付けられる。
｛２，１｝｛２，２｝｛２，１｝｛２，３｝｛１，０｝ Also, as above, chunks are associated with their chunk level and corresponding index as follows:
{2,1}{2,2}{2,1}{2,3}{1,0}

上記の最後のタプルは、そのチャンクが２回目のチャンキングを必要としなかったので、チャンクレベルが１であることを示していることに留意されたい。 Note that the last tuple above indicates that the chunk level is 1, since the chunk did not require a second chunking.

入力データが一連のチャンクレベルとインデックスに還元されると、その一連のチャンクレベルとインデックスを使用して、元のデータを識別する。関連付けは、一連のタプルとして、個別のビットストリングとして、及びその他の方法で保存され得る。 Once the input data is reduced to a set of chunk levels and indices, the set of chunk levels and indices is used to identify the original data. The associations can be stored as a set of tuples, as individual bit strings, and in other ways.

一連のチャンクレベルとインデックスに基づいてデータを再作成（又はスクイーズ解除）するには、プロセスは逆に動作する。この場合も、システムには同じ事前生成された表が必要である。チャンクレベルとインデックスの各タプルに対して、システムは事前生成された表を参照して、スクイーズされたチャンクをアンパックし、元のデータに戻す。 To recreate (or unsqueeze) the data based on a set of chunk levels and indexes, the process works in reverse. In this case, the system still needs the same pre-generated tables. For each chunk level and index tuple, the system consults the pre-generated tables to unpack the squeezed chunk back into the original data.

この代替実施形態は、図３のフローチャートに示されている。まず、以下のような特定の長さのデータの全ての順列を含む事前生成された表がステップ３０２で作成される。上述したように、好ましくは、その表は何らかの方法で永続化される。次に、システムは、ステップ３０４で、スクイーズ対象入力データを受け取る。次にプロセスは、ステップ３０６及び３０８で、データの長さが「事前生成された表」に配置される長さになるまで、データをより小さなセグメントにチャンクする。上述したように、このプロセスは、入力データセットが何回チャンクされたかをシステムが知ることができるように、チャンクレベルを維持する。その後、各チャンクはステップ３１０で事前生成された表に配置される。最後に、チャンク、そのチャンクレベル、及び事前生成された表内の夫々のインデックスが関連付けられ、その結果、ステップ３１２で、スクイーズされたデータが得られる。 This alternative embodiment is illustrated in the flow chart of FIG. 3. First, a pre-generated table is created in step 302 that contains all permutations of data of a particular length, such as: 1 x ...

本開示を、特定の実施形態及び一般的に関連する方法の観点から説明してきたが、これらの実施形態及び方法の変更及び順列は、当業者には明らかであろう。従って、例示的な実施形態に関する上記の説明は、本開示を制約するものではない。本開示の精神及び範囲から逸脱することなく、他の変更、置換、及び改変も可能である。
〔付記１〕
バイナリデータファイルのコンテンツ不可知参照の為のコンピュータ実装方法であって、
入力シードを用いて表を事前生成するステップであって、表は所定の長さのビットの全ての順列を含むステップと、
前記バイナリデータファイルの長さを決定するステップであって、前記長さは、前記バイナリデータファイルのビット数を含むステップと、
前記バイナリデータファイルを部分ストリングにチャンクするステップであって、各部分ストリングは前記バイナリデータファイルの長さよりも小さい長さであるステップと、
前記バイナリデータファイルの各チャンクについて、そのチャンクが前記事前生成された表内にあるかどうかを判断し、そのチャンクが事前生成された表内にある場合には、そのチャンクに前記事前生成された表内のチャンクの位置のインデックスを関連付け、そのチャンクが前記事前生成された表内にない場合には、チャンクされたバイナリデータを更に小さなチャンクに分割するステップと、
チャンクの数及び全チャンクの関連するインデックスを使用して前記バイナリデータファイルを示すステップと、を含む方法。
〔付記２〕
前記チャンクの数及び全チャンクの関連するインデックスを使用して前記バイナリデータファイルを示すステップが、
前記バイナリデータファイルの代わりに、前記チャンクの数及び全ての関連するインデックスを記憶装置に永続化するステップを含む、付記１に記載の方法。
〔付記３〕
前記チャンクの数及び全チャンクの関連するインデックスを使用して前記バイナリデータファイルを示すステップが、
前記データファイルの代わりに前記チャンクの数及び全チャンクの関連するインデックスを送信するステップを含む、付記１に記載の方法。
〔付記４〕
前記送信するステップは、前記チャンクの数及び全チャンクの関連するインデックスをネットワーク上で送信する、付記３に記載の方法。
〔付記５〕
前記送信するステップは、前記チャンク数及び全チャンクの関連するインデックスをバス上で送信する、付記３に記載の方法。
〔付記６〕
前記チャンクの数及び全チャンクの関連するインデックスを使用して前記バイナリデータファイルを示すステップは、
各順序付けられたペアがチャンクレベル及び関連するインデックスを示す、順序付けられたペアのタプルを作成するステップを含む、付記１に記載の方法。
〔付記７〕
前記チャンクの数及び全チャンクの関連するインデックスを使用して前記バイナリデータファイルを示すステップは、前記チャンクの数及び全チャンクの関連するインデックスを記憶装置上に永続化することを含む、付記１に記載の方法。
〔付記８〕
前記記憶装置はディスクである、付記７に記載の方法。
〔付記９〕
前記事前生成された表がハッシュ表である、付記１に記載の方法。
〔付記１０〕
前記事前生成された表が行列である、付記１に記載の方法。
〔付記１１〕
前記事前生成された表が揮発性メモリに永続化される、付記１に記載の方法。
〔付記１２〕
前記事前生成された表が不揮発性メモリに永続化される、付記１に記載の方法。
〔付記１３〕
前記バイナリデータファイルを部分ストリングにチャンクすることは、更に、
前記バイナリデータを所定の長さのチャンクにチャンクすることを含む、付記１に記載の方法。
〔付記１４〕
前記所定の長さが２メガバイトである、付記１３に記載の方法。
〔付記１５〕
前記所定の長さが２メガバイトよりも小さい、付記１３に記載の方法。
〔付記１６〕
前記所定の長さが２メガバイトよりも大きい、付記１３に記載の方法。
〔付記１７〕
前記バイナリデータファイルを部分ストリングにチャンクすることは、更に、
前記バイナリデータファイルを、同じサイズの２つのチャンクに再帰的に分割することを含む付記１に記載の方法。
〔付記１８〕
チャンクの数及び全チャンクの関連するインデックスに基づいてデータを取得する方法であって、
入力シードを使用して表を事前生成するステップであって、前記表は、所定の長さのビットの全ての順列を含み、前記事前生成された表を使用してチャンクの数及び関連するインデックスを生成するステップと、
各チャンクについて、そのチャンクに関連付けられたインデックスで表内にデータを配置するステップと、各チャンクに関連付けられたデータを返すステップと、を含む方法。
〔付記１９〕
各チャンクに関連するデータを返すステップは、
各チャンクに関連するデータを単一のビットストリームに連結することを含む、付記１８に記載の方法。 While the present disclosure has been described in terms of specific embodiments and generally associated methods, modifications and permutations of these embodiments and methods will be apparent to those skilled in the art. Thus, the above description of exemplary embodiments does not constrain the present disclosure. Other changes, substitutions, and alterations are possible without departing from the spirit and scope of the present disclosure.
[Appendix 1]
1. A computer-implemented method for content-agnostic browsing of a binary data file, comprising:
pre-generating a table using an input seed, the table containing all permutations of bits of a predetermined length;
determining a length of the binary data file, the length comprising a number of bits in the binary data file;
chunking the binary data file into substrings, each substring being of a length less than a length of the binary data file;
for each chunk of the binary data file, determining whether the chunk is within the pre-generated table, and if the chunk is within the pre-generated table, associating with the chunk an index of the chunk's location within the pre-generated table, and if the chunk is not within the pre-generated table, splitting the chunked binary data into smaller chunks;
and indicating the binary data file using a number of chunks and associated indices of all chunks.
[Appendix 2]
indicating the binary data file using the number of chunks and associated indices of all chunks,
2. The method of claim 1, further comprising persisting the number of chunks and all associated indexes to a storage device in place of the binary data file.
[Appendix 3]
indicating the binary data file using the number of chunks and associated indices of all chunks,
2. The method of claim 1, comprising sending the number of chunks and associated indices of all chunks instead of the data file.
[Appendix 4]
4. The method of claim 3, wherein the transmitting step transmits the number of chunks and associated indices of all chunks over a network.
[Appendix 5]
4. The method of claim 3, wherein the transmitting step transmits the number of chunks and associated indices of all chunks over a bus.
[Appendix 6]
indicating the binary data file using the number of chunks and associated indices of all chunks,
2. The method of claim 1, comprising creating a tuple of ordered pairs, each ordered pair indicating a chunk level and an associated index.
[Appendix 7]
2. The method of claim 1, wherein the step of indicating the binary data file using the number of chunks and associated indexes of all chunks includes persisting the number of chunks and associated indexes of all chunks on a storage device.
[Appendix 8]
8. The method of claim 7, wherein the storage device is a disk.
[Appendix 9]
2. The method of claim 1, wherein the pre-generated table is a hash table.
[Appendix 10]
2. The method of claim 1, wherein the pre-generated table is a matrix.
[Appendix 11]
2. The method of claim 1, wherein the pre-generated table is persisted in volatile memory.
[Appendix 12]
2. The method of claim 1, wherein the pre-generated table is persisted in non-volatile memory.
[Appendix 13]
Chunking the binary data file into substrings further comprises:
2. The method of claim 1, comprising chunking the binary data into chunks of a predetermined length.
[Appendix 14]
14. The method of claim 13, wherein the predetermined length is 2 megabytes.
[Appendix 15]
14. The method of claim 13, wherein the predetermined length is less than 2 megabytes.
[Appendix 16]
14. The method of claim 13, wherein the predetermined length is greater than 2 megabytes.
[Appendix 17]
Chunking the binary data file into substrings further comprises:
2. The method of claim 1, comprising recursively splitting the binary data file into two chunks of equal size.
[Appendix 18]
1. A method for retrieving data based on a number of chunks and associated indexes of all chunks, comprising:
pre-generating a table using an input seed, the table containing all permutations of bits of a given length, and generating a number of chunks and associated indexes using the pre-generated table;
The method includes the steps of: for each chunk, locating data in a table at an index associated with the chunk; and returning data associated with each chunk.
[Appendix 19]
The steps to return the data associated with each chunk are:
19. The method of claim 18, comprising concatenating data associated with each chunk into a single bitstream.

Claims

1. An information processing method for content-agnostic referencing of binary data files executed on a computer system, comprising:
using said computer system to pre-generate a table using an input seed, the table containing all permutations of bits of a predetermined length;
using the computer system to determine a length of the binary data file, the length comprising a number of bits in the binary data file;
using the computer system, chunking the binary data file into substrings, each substring being of a length less than a length of the binary data file;
using the computer system to determine, for each chunk of the binary data file, whether the chunk is within the pre-generated table, and if the chunk is within the pre-generated table, associating with the chunk an index of the chunk's location within the pre-generated table, and if the chunk is not within the pre-generated table, splitting the chunked binary data into smaller chunks;
and using the computer system to index the binary data file using a number of chunks and associated indices of all chunks.

indicating the binary data file using the number of chunks and associated indices of all chunks,
The method of claim 1 , further comprising persisting the number of chunks and any associated indexes to a storage device in place of the binary data file.

indicating the binary data file using the number of chunks and associated indices of all chunks,
The method of claim 1 , comprising transmitting a number of chunks and associated indices of all chunks in place of the binary data file .

The method of claim 3, wherein the transmitting step transmits the number of chunks and associated indices of all chunks over a network.

4. The method of claim 3, wherein the transmitting step transmits the number of chunks and associated indices of all chunks over a bus.

indicating the binary data file using the number of chunks and associated indices of all chunks,
The method of claim 1 , comprising creating a tuple of ordered pairs, each ordered pair indicating a chunk level and an associated index.

The method of claim 1, wherein the step of indicating the binary data file using the number of chunks and associated indexes of all chunks includes persisting the number of chunks and associated indexes of all chunks on a storage device.

The method of claim 7, wherein the storage device is a disk.

The method of claim 1, wherein the pre-generated table is a hash table.

The method of claim 1, wherein the pre-generated table is a matrix.

The method of claim 1, wherein the pre-generated table is persisted in volatile memory.

The method of claim 1, wherein the pre-generated table is persisted in non-volatile memory.

Chunking the binary data file into substrings further comprises:
The method of claim 1 , comprising chunking the binary data into chunks of a predetermined length.

The method of claim 13, wherein the predetermined length is 2 megabytes.

The method of claim 13, wherein the predetermined length is less than 2 megabytes.

The method of claim 13, wherein the predetermined length is greater than 2 megabytes.

Chunking the binary data file into substrings further comprises:
2. The method of claim 1, comprising recursively splitting the binary data file into two chunks of equal size.