TWI825305B - Transcoder and method and article for transcoding - Google Patents
Transcoder and method and article for transcoding Download PDFInfo
- Publication number
- TWI825305B TWI825305B TW109112659A TW109112659A TWI825305B TW I825305 B TWI825305 B TW I825305B TW 109112659 A TW109112659 A TW 109112659A TW 109112659 A TW109112659 A TW 109112659A TW I825305 B TWI825305 B TW I825305B
- Authority
- TW
- Taiwan
- Prior art keywords
- data
- chunk
- encoding
- encoded data
- input
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/157—Transformation using dictionaries or tables
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
本發明概念大體而言是有關於儲存裝置,且更具體而言,是有關於對儲存裝置內的資料進行轉換編碼。 The inventive concepts generally relate to storage devices, and more specifically, to transcoding data within the storage devices.
[相關申請案資料] [Relevant application information]
本申請案主張在2019年4月16日提出申請的美國臨時專利申請案第62/834,900號、在2019年12月9日提出申請的美國臨時專利申請案第62/945,877號及在2019年12月9日提出申請的美國臨時專利申請案第62/945,883號的權益,所有該些申請案均出於所有目的而併入本案供參考。 This application claims U.S. Provisional Patent Application No. 62/834,900 filed on April 16, 2019, U.S. Provisional Patent Application No. 62/945,877 filed on December 9, 2019, and U.S. Provisional Patent Application No. 62/945,877 filed on December 9, 2019. No. 62/945,883 filed on September 9, all of which are hereby incorporated by reference for all purposes.
本申請案是有關於在2020年3月16日提出申請的同在申請中的美國專利申請案第16/820,665號,所述美國專利申請案出於所有目的而併入本案供參考。 This application is related to co-pending U.S. Patent Application No. 16/820,665, filed on March 16, 2020, which is hereby incorporated by reference for all purposes.
例如固態驅動機(Solid State Drive,SSD)等儲存裝置可儲存相對大量的資料。主機處理器可向SSD請求資料以對資料 執行操作。視連接主機處理器與SSD的具體架構而定,將此資料傳輸至主機處理器可能需要相對大量的時間。例如,若主機處理器與SSD是使用第三代快捷週邊組件互連介面(Peripheral Component Interconnect Express,PCIe)的4個通道(lane)被連接,則在SSD與主機處理器之間可載運的最大資料量為約每秒4十億位元組(GB)。 Storage devices such as solid state drives (SSDs) can store relatively large amounts of data. The host processor can request data from the SSD to process the data Perform actions. Depending on the specific architecture connecting the host processor to the SSD, transferring this data to the host processor may take a relatively large amount of time. For example, if the host processor and the SSD are connected using the 4 lanes (lanes) of the third generation Peripheral Component Interconnect Express (PCIe), the maximum amount of data that can be carried between the SSD and the host processor The data volume is approximately 4 billion bytes (GB) per second.
仍需要減少發送至主機的資料量且利用行式格式(columnar format)的益處。 There is still a need to reduce the amount of data sent to the host and take advantage of the benefits of columnar format.
根據示例性實施例,一種轉換編碼器(transcoder)包括:緩衝器,用以儲存輸入已編碼資料(input encoded data);索引映射器(index mapper),用以自輸入字典映射至輸出字典;當前編碼緩衝器,用以儲存修改後的當前已編碼資料,所述修改後的當前已編碼資料是回應於所述輸入已編碼資料、所述輸入字典以及自所述輸入字典至所述輸出字典的映射;先前編碼緩衝器,用以儲存修改後的先前已編碼資料,所述修改後的先前已編碼資料是回應於先前輸入已編碼資料、所述輸入字典以及所述自所述輸入字典至所述輸出字典的映射;以及規則評估器,用以因應於所述當前編碼緩衝器中的所述修改後的當前已編碼資料、所述先前編碼緩衝器中的所述修改後的先前已編碼資料以及轉換編碼規則而生成輸出串流。 According to an exemplary embodiment, a transcoder includes: a buffer for storing input encoded data; an index mapper for mapping from an input dictionary to an output dictionary; currently An encoding buffer used to store modified currently encoded data in response to the input encoded data, the input dictionary, and the input dictionary to the output dictionary. Mapping; a previously encoded buffer for storing modified previously encoded data in response to previously input encoded data, the input dictionary, and the step from the input dictionary to the previously encoded data. a mapping of the output dictionary; and a rule evaluator for responding to the modified currently encoded data in the current encoding buffer, the modified previously encoded data in the previous encoding buffer and convert encoding rules to generate output streams.
根據示例性實施例,一種進行轉換編碼的方法包括:在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊(chunk);確定主機電腦對所述第一資料組塊感興趣;至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料;在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊;確定所述主機電腦對所述第二資料組塊不感興趣;至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料;以及將所述第一已編碼資料及所述第二已編碼資料輸出至所述主機電腦。 According to an exemplary embodiment, a method of transcoding includes: receiving, at a transcoder, a first data chunk from a storage device from input encoded data; determining whether a host computer is aware of the first data chunk; interest; generating first encoded data from the first data chunk based at least in part on the host computer's interest in the first data chunk; receiving at the transcoder from the storage device Determining that the host computer is not interested in the second data chunk; determining that the host computer is not interested in the second data chunk; determining that the host computer is not interested in the second data chunk; The second data chunk generates second encoded data; and the first encoded data and the second encoded data are output to the host computer.
根據示例性實施例,一種進行轉換編碼的製品(article)包括:非暫時性儲存媒體,所述非暫時性儲存媒體上儲存有指令,所述指令在由機器執行時使得:在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊;確定主機電腦對所述第一資料組塊感興趣;至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料;在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊;確定所述主機電腦對所述第二資料組塊不感興趣;至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料;以及將所述第一已編碼資料及所述第二已編碼資料輸出至所述主機電腦。 According to an exemplary embodiment, an article for transcoding includes a non-transitory storage medium having instructions stored thereon that, when executed by a machine, cause: at a transcoder receiving a first chunk of data from an input encoded data from a storage device; determining that a host computer is interested in the first chunk of data; The first data chunk generates first encoded data; receiving a second data chunk from the input encoded data at the conversion encoder from the storage device; determining the host computer's response to the first encoded data. two chunks of data are not of interest; generating second encoded data from the second chunk of data based at least in part on the host computer's lack of interest in the second chunk of data; and converting the first encoded data to and outputting the second encoded data to the host computer.
105:主機電腦/機器 105:Host computer/machine
110:處理器 110: Processor
115:記憶體 115:Memory
120:儲存裝置/SSD 120:Storage device/SSD
125:記憶體控制器 125:Memory controller
130:裝置驅動器 130:Device driver
205:時鐘 205:Clock
210:網路連接器 210:Network connector
215:匯流排 215:Bus
220:使用者介面 220:User interface
225:輸入/輸出引擎 225:Input/output engine
305、515:儲存器 305, 515: Storage
310、320:箭頭 310, 320: Arrow
315:儲存器內處理器/儲存器內計算 315: In-memory processor/in-memory computing
405:已壓縮資料 405: Compressed data
410:解壓縮器 410:Decompressor
415:已解壓縮資料 415: Data has been decompressed
420:轉換編碼器 420:Convert encoder
425:已轉換編碼的資料 425: Converted data
430:解碼器 430:Decoder
435:已過濾的普通資料 435:Filtered general data
505:主機介面層(HIL) 505: Host Interface Layer (HIL)
510:SSD控制器/儲存裝置控制器 510:SSD controller/storage device controller
515-1、515-2、515-3、515-4、515-5、515-6、515-7、515-8:快閃記憶體晶片 515-1, 515-2, 515-3, 515-4, 515-5, 515-6, 515-7, 515-8: Flash memory chip
520-1、520-2、520-3、520-4:通道 520-1, 520-2, 520-3, 520-4: Channel
525:轉譯層 525: Translation layer
530:檔案至區塊映射 530: File to block mapping
605:循環緩衝器/緩衝器 605: Circular buffer/buffer
610:串流分離器 610:Stream splitter
615:索引映射器 615:Index mapper
620:當前編碼緩衝器 620: Current encoding buffer
625:先前編碼緩衝器 625: Previous encoding buffer
630:轉換編碼規則 630:Conversion encoding rules
635:規則評估器 635: Rule evaluator
705-1、705-2、705-3:組塊 705-1, 705-2, 705-3: Chunking
805:輸入字典 805:Input dictionary
810:輸出字典 810: Output dictionary
905:檔案元資料 905:File metadata
910-1、910-2、910-3:行組塊 910-1, 910-2, 910-3: Row chunking
915:檔案至區塊映射 915: File to block mapping
920、925:字典頁面 920, 925: Dictionary page
930-1、930-2、930-3:資料頁面 930-1, 930-2, 930-3: Information page
1005:儲存器內計算控制器 1005: In-memory computing controller
1010:行組塊處理器 1010: Row chunking processor
1105:輸入緩衝器 1105:Input buffer
1110:輸出緩衝器 1110: Output buffer
1115:述詞評估器 1115:Predicate evaluator
1120:「不理會」評估器 1120: "Ignore" evaluator
1205、1210、1215、1220、1225、1230、1235、1240、1245、1250、1255、1260、1265、1270、1275、1280、1285、1290、1295、1305、1310、1315、1405、1410、1420、1425、1430、1440、1445、1505、1510、1515、1520、1525、1605、1610、1615、1620、1625、1630、1635、1640、1645、1650、1655、1660、1665、1670:方塊 1205,1210,1215,1220,1225,1230,1235,1240,1245,1250,1255,1260,1265,1270,1275,1280,1285,1290,1295,1305,1310,1315,1405,1410,142 0. 1425,1430,1440,1445,1505,1510,1515,1520,1525,1605,1610,1615,1620,1625,1630,1635,1640,1645,1650,1655,1660,1665,1670: Square
1415、1435:虛線 1415, 1435: dashed line
圖1示出根據本發明概念實施例包括可支援已編碼資料的轉換編碼的儲存裝置(例如固態驅動機(SSD))的系統。 FIG. 1 illustrates a system including a storage device (eg, a solid state drive (SSD)) that can support transcoding of encoded data in accordance with an embodiment of the present invention.
圖2示出圖1所示機器的一些其他細節。 Figure 2 shows some further details of the machine shown in Figure 1.
圖3示出圖1所示儲存裝置與圖1所示處理器使用不同的方法傳達相同的資料。 FIG. 3 shows that the storage device shown in FIG. 1 and the processor shown in FIG. 1 use different methods to communicate the same data.
圖4示出根據本發明概念實施例,圖1所示儲存裝置與圖1所示處理器傳達已轉換編碼的資料。 FIG. 4 shows the storage device shown in FIG. 1 and the processor shown in FIG. 1 communicating converted encoded data according to a conceptual embodiment of the present invention.
圖5示出根據本發明概念實施例的圖1所示儲存裝置的細節。 FIG. 5 shows details of the storage device shown in FIG. 1 according to an embodiment of the present invention.
圖6示出根據本發明概念實施例的圖4所示轉換編碼器的細節。 FIG. 6 shows details of the transform encoder shown in FIG. 4 according to an embodiment of the present invention.
圖7示出根據本發明概念實施例,圖6所示串流分離器(stream splitter)將輸入已編碼資料劃分成組塊。 FIG. 7 illustrates the stream splitter shown in FIG. 6 dividing input encoded data into chunks according to a conceptual embodiment of the present invention.
圖8示出根據本發明概念實施例,圖6所示索引映射器將輸入字典映射至輸出字典。 FIG. 8 illustrates the index mapper shown in FIG. 6 mapping an input dictionary to an output dictionary according to a conceptual embodiment of the present invention.
圖9示出以行式格式儲存的示例性檔案。 Figure 9 shows an exemplary file stored in line format.
圖10示出根據本發明概念實施例被配置成實作轉換編碼的圖1所示儲存裝置,其中資料是以行式格式儲存。 FIG. 10 illustrates the storage device shown in FIG. 1 configured to implement transform encoding according to an embodiment of the present invention, wherein data is stored in a row format.
圖11示出根據本發明概念實施例被配置成實作轉換編碼的圖10所示行組塊處理器,其中資料是以行式格式儲存。 FIG. 11 illustrates the row chunking processor of FIG. 10 configured to implement transform encoding in which data is stored in a row format according to an embodiment of the present invention.
圖12A至圖12C示出根據本發明概念實施例,圖4及圖6所 示轉換編碼器對資料進行轉換編碼的示例性程序的流程圖。 12A to 12C illustrate conceptual embodiments according to the present invention. The figures shown in FIGS. 4 and 6 A flowchart showing an exemplary procedure for a transcoder to transcode material.
圖13示出圖6所示串流分離器將輸入已編碼資料劃分成組塊的示例性程序的流程圖。 FIG. 13 is a flowchart illustrating an exemplary procedure for dividing input encoded data into chunks by the stream demultiplexer shown in FIG. 6 .
圖14A至圖14B示出根據本發明概念實施例,圖10所示行組塊處理器及/或圖4所示轉換編碼器對以行式格式儲存的資料進行轉換編碼的示例性程序的流程圖。 14A to 14B illustrate the flow of an exemplary program for converting and encoding data stored in a row format by the line chunking processor shown in FIG. 10 and/or the conversion encoder shown in FIG. 4 according to an embodiment of the present invention. Figure.
圖15示出根據本發明概念實施例,圖6所示索引映射器將輸入字典映射至輸出字典的示例性程序的流程圖。 15 illustrates a flowchart of an exemplary procedure for mapping an input dictionary to an output dictionary by the index mapper shown in FIG. 6, according to a conceptual embodiment of the present invention.
圖16A至圖16B示出根據本發明概念實施例,圖10所示儲存器內計算控制器管理自圖1所示主機電腦接收的述詞(predicate)並潛在地對已轉換編碼的資料執行加速功能的示例性程序的流程圖。 16A-16B illustrate the in-storage computing controller of FIG. 10 managing predicates received from the host computer of FIG. 1 and potentially performing acceleration on transcoded data, in accordance with embodiments of the present invention. Functional flowchart of an example program.
現在將詳細參照本發明概念的實施例,所述實施例的實例示出於附圖中。在以下詳細說明中,闡述諸多具體細節以使得能夠透徹地理解本發明概念。然而,應理解,此項技術中具有通常知識者無需該些具體細節即可實踐本發明概念。在其他情形中,未對眾所習知的方法、程序、組件、電路、及網路予以詳細闡述,以避免使實施例的態樣不必要地模糊不清。 Reference will now be made in detail to the embodiments of the inventive concept, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the inventive concept. However, it will be understood that one of ordinary skill in the art may practice the inventive concepts without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail in order to avoid unnecessarily obscuring aspects of the embodiments.
應理解,儘管本文中可能使用「第一」、「第二」等用語來闡述各種元件,然而該些元件不應受該些用語限制。該些用語 僅用於區分各個元件。例如,在不背離本發明概念的範圍的條件下,可將第一模組稱為第二模組,且類似地,可將第二模組稱為第一模組。 It should be understood that although terms such as “first” and “second” may be used herein to describe various elements, these elements should not be limited by these terms. these terms Only used to distinguish individual components. For example, a first module may be referred to as a second module, and similarly, the second module may be referred to as a first module, without departing from the scope of the inventive concept.
本文中在對本發明概念的說明中所使用的術語僅用於闡述具體實施例,而並非旨在限制本發明概念。除非上下文中另外清楚地指明,否則在對本發明概念的說明及隨附申請專利範圍中所使用的單數形式「一(a/an)」及「所述(the)」旨在亦包含複數形式。亦應理解,本文所用用語「及/或(and/or)」指代且囊括相關聯所列項中一或多個項的任意及所有可能組合。更應理解,當在本說明書中使用用語「包括(comprises及/或comprising)」時,是指明所陳述特徵、整數、步驟、操作、元件、及/或組件的存在,但不排除一或多個其他特徵、整數、步驟、操作、元件、組件、及/或其群組的存在或添加。圖式中的組件及特徵未必按比例繪製。 The terminology used in the description of the inventive concept herein is for the purpose of describing specific embodiments only and is not intended to limit the inventive concept. As used in the description of the inventive concept and the appended claims, the singular forms "a/an" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. It should be further understood that when the word "comprises and/or comprising" is used in this specification, it refers to the presence of stated features, integers, steps, operations, elements, and/or components, but does not exclude the presence of one or more The presence or addition of other features, integers, steps, operations, elements, components, and/or groups thereof. Components and features in the drawings are not necessarily drawn to scale.
將某一處理能力放置於離SSD較近之處(例如,使用現場可程式化閘陣列(Field Programmable Gate Array,FPGA)、特殊應用積體電路(Application-Specific Integrated Circuit,ASIC)、圖形處理單元(Graphics Processing Unit,GPU)或某一其他處理器)會帶來一些優勢。第一,SSD與近處理器之間的連接可較SSD與主機處理器之間的連接支援更高的頻寬,進而允許更快的資料傳送。第二,藉由使主機處理器不必處理資料,主機處理器可在所述近處理器處置資料處理的同時施行其他功能。 Place a certain processing capability closer to the SSD (for example, using Field Programmable Gate Array (FPGA), Application-Specific Integrated Circuit (ASIC), graphics processing unit (Graphics Processing Unit, GPU) or some other processor) will bring some advantages. First, the connection between the SSD and the near-processor can support higher bandwidth than the connection between the SSD and the host processor, allowing for faster data transfer. Second, by freeing the host processor from having to process data, the host processor can perform other functions while the proximal processor handles data processing.
然而當資料被壓縮或編碼時,對資料的近儲存器處理(near-storage processing)具有潛在的缺點。為對原始資料進行操作,一些近儲存器處理器可在其能夠對資料進行操作之前先對資料進行解壓縮或解碼。此外,近儲存器處理器可將結果報告回至主機處理器。若結果中被發送至主機處理器的資料量大於原始資料量,則藉由使用近儲存器處理器所帶來的好處可能會失去,或者在最壞的情形下,會使得發送至主機處理器的資料較起初原本發送至主機處理器的已壓縮或已編碼資料更多。 However, near-storage processing of data has potential drawbacks when the data is compressed or encoded. To operate on raw data, some near-memory processors may decompress or decode the data before they can operate on it. Additionally, the near-memory processor can report results back to the host processor. If the resulting amount of data sent to the host processor is greater than the original amount of data, the benefits gained by using a near-memory processor may be lost, or in the worst case, the amount of data sent to the host processor may be contains more compressed or encoded data than was originally sent to the host processor.
另外,雖然可籠統地對資料進行轉換編碼,然而當資料是以行式格式儲存時,可進行一些調適以利用行式格式。 Additionally, although the data can be transcoded generally, when the data is stored in row format, some adaptations can be made to take advantage of the row format.
對呈壓縮格式的資料的近資料處理可能會抵消卸載(offload)的一些好處。若SSD與主機處理器之間的連接支援傳輸X位元組/秒,資料是使用壓縮比率Y被壓縮,且被選擇進行傳輸的資料量為Z,則由近處理器發送至主機處理器的資料量可能為X * Y * Z。若此乘積小於X,亦即若Y * Z<1,則加速(近處理)可為有益的。 Near-data processing of data in compressed format may negate some of the benefits of offload. If the connection between the SSD and the host processor supports the transfer of X bytes/second, the data is compressed using a compression ratio Y, and the amount of data selected for transfer is Z, then the The amount of data may be X * Y * Z. If this product is less than X, that is if Y * Z < 1, speedup (near processing) can be beneficial.
在本發明概念的一些實施例中,行式儲存可使用資料編碼(例如,運行長度編碼(Run Length Encoding,RLE))及/或壓縮(壓縮(高速))來減少儲存佔用面積。編碼,而非壓縮,可提供主要的熵減少。編碼之後的壓縮比率往往是小的(約小於2)。 In some embodiments of the inventive concept, row-based storage may use data encoding (eg, Run Length Encoding (RLE)) and/or compression (Compression (High Speed)) to reduce storage footprint. Encoding, not compression, provides the major entropy reduction. The compression ratio after encoding is often small (about less than 2).
在本發明概念的一些實施例中,例如,至少部分地基於編碼演算法,可在不擴大結果(亦即,不會使得發送至主機處理 器的結果較原本發送至主機處理器的已編碼原始資料更大)的情況下對已編碼資料進行近處理。可在不擴大結果的情況下使用的編碼演算法可包括但不限於字典壓縮、前綴編碼(Prefix Encoding)、運行長度編碼(RLE)、叢集編碼(Cluster Encoding)、稀疏編碼(Sparse Encoding)及間接編碼(Indirect Encoding):其他編碼演算法亦可結合本發明概念的實施例使用。雖然以下所述的本發明概念實施例可著重於RLE及位元打包(Bit Packing),然而本發明概念的實施例可擴展至涵蓋其他編碼演算法。 In some embodiments of the inventive concept, for example, based at least in part on an encoding algorithm, the results can be generated without enlarging the result (i.e., without causing it to be sent to the host for processing). Processing of encoded data when the result of the processor is larger than the original encoded data originally sent to the host processor). Encoding algorithms that can be used without expanding the results may include, but are not limited to, dictionary compression, prefix encoding (Prefix Encoding), run length encoding (RLE), cluster encoding (Cluster Encoding), sparse encoding (Sparse Encoding) and indirection. Encoding (Indirect Encoding): Other encoding algorithms can also be used in conjunction with embodiments of the concept of the present invention. Although the embodiments of the inventive concept described below may focus on RLE and bit packing (Bit Packing), the embodiments of the inventive concept may be extended to cover other encoding algorithms.
亦存在附加問題:如何教轉換編碼器過濾什麼資料。在減小所儲存資料大小的字典可能儲存於除資料儲存位置之外的某處的情況下,此尤其成問題。行式儲存,即此種儲存格式的實例,簡化了對感興趣資料的定位。然而,由於字典可能儲存於與資料分開的某處,因此系統可能需要能夠定位字典以及所討論資料來執行轉換編碼。 There is also the additional problem of how to teach the transform encoder what data to filter. This is particularly problematic in cases where the dictionary that reduces the size of the stored data may be stored somewhere other than where the data is stored. Row storage, an example of this storage format, simplifies locating data of interest. However, since the dictionary may be stored somewhere separate from the data, the system may need to be able to locate the dictionary as well as the data in question to perform the transcoding.
本發明概念的實施例使得能夠過濾已編碼資料而不擴大資料。可使用轉換規則、使用嵌入於已編碼資料中的編碼資訊對已過濾資料進行重新編碼。本發明概念實施例中的轉換編碼器可過濾已編碼資料並修改發送至主機的編碼。因此,並非主機必須處理普通資料(視壓縮演算法及/或已編碼資料的有效性而定,所述普通資料相對於已編碼/已壓縮資料可為非常大的),而是主機可接收及處理已編碼資料。由於主機與儲存裝置之間的頻寬可具有限制,此將本質上影響傳輸資料所花費的時間,因此與發送普通 資料(已過濾或未過濾)相較,發送已編碼資料可節省處理時間。 Embodiments of the inventive concept enable filtering of encoded data without expanding the data. Filtered data can be re-encoded using transformation rules, using the encoding information embedded in the encoded data. The transcoder in an embodiment of the present invention can filter the encoded data and modify the encoding sent to the host. Therefore, rather than the host having to process normal data (which can be very large relative to the encoded/compressed data depending on the compression algorithm and/or the effectiveness of the encoded data), the host can receive and Process coded data. Since the bandwidth between the host and the storage device can be limited, this will essentially affect the time it takes to transmit data, so it is different from sending ordinary Compared with data (filtered or unfiltered), sending encoded data saves processing time.
循環緩衝器(Circular Buffer)可儲存足夠一次處理的資料。本發明概念的實施例可將循環緩衝器替換成使用其他結構的緩衝器。 A Circular Buffer can store enough data for one processing. Embodiments of the inventive concept may replace the circular buffer with a buffer using other structures.
索引映射器可提供自輸入字典至欲與輸出串流一起使用的簡化字典的映射。 An index mapper provides a mapping from an input dictionary to a simplified dictionary to be used with the output stream.
當前編碼緩衝器可根據適當的編碼儲存自輸入串流讀取的資料。使用來自轉換編碼規則、當前編碼緩衝器及先前編碼緩衝器的資訊,規則評估器可決定如何處理當前編碼緩衝器中的資料。視當前編碼緩衝器中的資料是否可與先前編碼緩衝器中的資料組合,規則評估器可基於當前編碼緩衝器中的資料來更新先前編碼緩衝器,輸出先前編碼緩衝器(並將先前編碼緩衝器替換成當前編碼緩衝器),或者採取某一其他動作。例如,若轉換編碼器已識別出當前編碼緩衝器中被認為是「不理會(don’t care)」值(以下會進一步論述)的值,則該些值可與先前編碼緩衝器中現有的「不理會」值組合。 The current encoding buffer stores data read from the input stream according to the appropriate encoding. Using information from the conversion encoding rules, the current encoding buffer, and the previous encoding buffer, the rule evaluator determines how to process the data in the current encoding buffer. Depending on whether the data in the current encoding buffer can be combined with the data in the previous encoding buffer, the rule evaluator can update the previous encoding buffer based on the data in the current encoding buffer, output the previous encoding buffer (and replace the previous encoding buffer with buffer with the current encoding buffer), or take some other action. For example, if the converting encoder has identified values in the current encoding buffer that are considered "don't care" values (discussed further below), then those values can be compared with existing values in the previous encoding buffer. "Ignore" value combination.
串流分離器可用於識別輸入串流中使用不同編碼形式被編碼的不同部分(串流)。若使用單個編碼方案,則編碼方案可作為參數(即,編碼類型)被傳遞。否則,若使用多個編碼方案(即,不使用編碼類型),則給定串流的編碼方案是藉由查核輸入串流本身來確定。例如,以行式儲存格式編碼儲存的資料的第一位元組可包含編碼類型資訊。對於RLE與位元打包的混合,若最低有效 位元(Least Significant Bit,LSB)是0,則編碼類型=RLE;若LSB是1,則編碼類型=位元打包。 Stream splitters can be used to identify different parts (streams) of an input stream that are encoded using different encodings. If a single encoding scheme is used, the encoding scheme can be passed as a parameter (ie, encoding type). Otherwise, if multiple encoding schemes are used (i.e., no encoding type is used), the encoding scheme for a given stream is determined by examining the input stream itself. For example, the first tuple of data stored in row storage format encoding may contain encoding type information. For a mix of RLE and bit packing, if least significant If the Least Significant Bit (LSB) is 0, the encoding type = RLE; if the LSB is 1, the encoding type = bit packing.
作為各種編碼形式如何工作的實例,考量RLE及位元打包(Bit Packing,BP)。在RLE中,使用可變無符號整數來表示值被重複的頻率,然後給出固定長度的值。因此,例如,並非發送00000011 00000011 00000011 00000011 00000011 00000011 00000011 00000011 00000011(十進制值3的9個複本),而是可將資料編碼為00001001(十進制值9)00000011(十進制值3),進而指示00000011應重複9次。 As examples of how various encoding forms work, consider RLE and Bit Packing (BP). In RLE, a variable unsigned integer is used to represent how often a value is repeated, and then a fixed-length value is given. So, for example, instead of sending 00000011 00000011 00000011 00000011 00000011 00000011 00000011 00000011 00000011 (9 copies of the decimal value 3), the data could be encoded as 00001001 (the decimal value 9)0000 0011 (decimal value 3), which in turn indicates that 00000011 should be repeated 9 times.
在BP中,被確定佔用較少空間的資料可與其他值組合。例如,若資料通常使用8個位元來儲存,則儲存4個值會佔用總共32個位元。然而,若已知值各自佔用不多於4個位元,則可將二個值儲存於單個位元組中:簡言之,此即為位元打包。由於指示什麼資料被打包及什麼資料未被打包會存在某一開銷,因此空間的節省較所述情形少一點,但仍為有益的。 In BP, data determined to occupy less space can be combined with other values. For example, if data typically uses 8 bits to store, storing 4 values would take up a total of 32 bits. However, if the values are known to occupy no more than 4 bits each, then the two values can be stored in a single byte: in short, this is bit packing. Since there is some overhead in indicating what data is packed and what is not packed, the space savings are a little less than in the case described, but still beneficial.
編碼包括無符號位元組中的群組數目,隨後是一或多個位元組中的已打包值清單。群組中值的最大數目可為8,且群組的最大數目可為63。因此,例如,為了表示資料00000000 00000001 00000000 00000001 00000000 00000001 00000000 00000001(十進制值0 1 0 1 0 1 0 1),可將群組定義為0000001(群組1)00010000(0、1)00010000(0、1)00010000(0、1)00010000(0、1)。
The encoding consists of the number of groups in unsigned bytes, followed by a list of packed values in one or more bytes. The maximum number of values in a group can be 8, and the maximum number of groups can be 63. So, for example, to represent the data 00000000 00000001 00000000 00000001 00000000 00000001 00000000 00000001 (
如上所述,RLE(及其他編碼)可使用可變無符號整數。
可變無符號整數亦可使用編碼。在每一個八位元群組中,最高有效位元可指示當前位元組是值中的最後一個位元組還是存在至少一個後續位元組。在使用多個位元組的情況下,首先呈現最低有效位元組,且最後呈現最高有效位元組。因此,例如,十進制值1可被表示為00000001,十進制值2可被表示為00000010,依此類推,直至01111111(十進制值127)。十進制值128可被表示為10000000 00000001,十進制值129可被表示為10000000 00000010,等等。本質上,二進制值被劃分成由7個位元形成的群組,其中除最高有效群組之外,每一由7個位元形成的群組前面均有1。例如,十進制值16,384可被表示為10000000 10000000 00000001。
As mentioned above, RLE (and other encodings) can use mutable unsigned integers.
Variable unsigned integers can also be encoded. In each octet, the most significant bit indicates whether the current byte is the last byte in the value or whether there is at least one subsequent byte. Where multiple bytes are used, the least significant byte is presented first, and the most significant byte is presented last. So, for example, the
當使用轉換編碼器處理已編碼資料時,一些資料可能被認為是「不理會」資料。亦即,可能存在對於正執行的操作沒有價值的一些資料。作為轉換編碼器的操作結果,被認為是「不理會」資料的資料可被映射至不同的值。 When using a transcoder to process encoded data, some data may be considered "don't care" data. That is, there may be some data that is of no value to the operation being performed. As a result of the transformation encoder's operation, data considered "don't care" data can be mapped to different values.
考量其中資料庫儲存各種人的公民身份資訊的情況。公民身份可使用字串(例如「China」、「Korea」、「India」、「United States」等)來儲存。然而,由於公民身份的可能值是自有限集合得到的,因此可使用字典來減少資料庫中儲存的資料量。因此,例如,值「0」可表示China(中國),值「1」可表示India(印度),值「2」可表示Korea(韓國),且值「3」可表示United States(美國),其中將代表值(索引)而非國家名稱儲存於資料庫中。由於 存在195個國家(截至2019年7月19日),因此可使用一個位元組來儲存索引,此遠少於使用每字元一個位元組來儲存國家名稱的字串時將使用的位元組。 Consider a situation in which a database stores citizenship information for various people. Citizenship status can be stored using strings (such as "China", "Korea", "India", "United States", etc.). However, since the possible values of citizenship are derived from a finite set, a dictionary can be used to reduce the amount of data stored in the database. So, for example, a value of "0" could represent China, a value of "1" could represent India, a value of "2" could represent Korea, and a value of "3" could represent the United States. Representative values (indexes) are stored in the database instead of country names. due to There are 195 countries (as of July 19, 2019), so one byte per character can be used to store the index, which is far less than would be used if one byte per character were used to store a string of country names. group.
然而,正執行的加速操作可能對美國公民感興趣:例如,所述操作可為對資料庫中美國公民的數目進行計數。因此,其他國家的公民與所述操作無關:他們是「不理會」值。轉換編碼器可映射字典及索引,以反映操作所適用的資料。 However, the acceleration operation being performed may be of interest to US citizens: for example, the operation may be to count the number of US citizens in the database. Therefore, citizens of other countries have nothing to do with the operation: they "don't care" about the value. Transformation encoders map dictionaries and indexes to reflect the data to which the operation applies.
行式格式可使用RLE或位元打包來對資訊進行編碼。鑒於以行式儲存格式儲存的值字串的一部分,可使用一個位元來指示資料是使用RLE還是位元打包被儲存;然後可相應地理解其餘的資料。 Line format can use RLE or bit packing to encode information. Given that part of a value string is stored in row storage format, one bit can be used to indicate whether the data is stored using RLE or bit packing; the rest of the data can then be interpreted accordingly.
為了理解根據本發明概念實施例的轉換編碼器可如何為已編碼資料提供替換字典,考量其中資料包括大量人的公民身份資訊的情況。由於每一人作為其公民的國家的名稱非常長,但國家名稱的數目相對小(甚至表示200個國家亦將僅佔用約8個位元,此相對於以國家名稱中每字元一個位元組來儲存每一公民的國家名稱字串而言仍為顯著的節省),因此字典提供了所儲存資料量的有意義的減少。此種編碼可使用任何所期望的編碼方案:例如,RLE編碼、字典壓縮、前綴編碼、位元打包、叢集編碼、稀疏編碼及間接編碼。 To understand how a transform encoder according to embodiments of the present invention can provide a replacement dictionary for encoded data, consider a situation in which the data includes citizenship information for a large number of people. Since the names of countries of which each person is a citizen are very long, the number of country names is relatively small (even representing 200 countries would only take up about 8 bits, compared to one byte per character in the country name). This is still a significant savings in terms of storing the country name string for each citizen), so the dictionary provides a meaningful reduction in the amount of data stored. This encoding can use any desired encoding scheme: for example, RLE encoding, dictionary compression, prefix encoding, bit packing, cluster encoding, sparse encoding, and indirect encoding.
現在,若所應用的述詞(資料的過濾)只是定位美國的公民,則與其他公民相關的資料不被感興趣。例如,主機可能對 知曉資料庫中有多少美國公民感興趣。作為轉換的結果,字典可被簡化成針對美國公民的一個條目(亦可存在針對「不理會」條目的隱式或顯式條目),且RLE編碼可被「壓縮」以組合針對非美國的各個國家的公民的相鄰RLE條目。因此,資料的編碼被壓縮成包括1(或2)列的字典。實際已編碼資料亦可減少,乃因與非美國公民相關的資料可索引至新字典中的單個條目。因此,藉由將述詞下推至轉換編碼器中,可過濾已編碼資料,並提供新的編碼,以減少最終發送至主機的資料量。字典映射可表示原字典至轉換編碼字典的映射。 Now, if the applied predicate (data filter) only locates citizens of the United States, data related to other citizens will not be of interest. For example, the host might Know how many US citizens are interested in the database. As a result of the transformation, the dictionary can be reduced to a single entry for US citizens (there may also be implicit or explicit entries for "ignore" entries), and the RLE encoding can be "compressed" to combine individual entries for non-US citizens. Adjacent RLE entries for citizens of the country. Therefore, the encoding of the data is compressed into a dictionary consisting of 1 (or 2) columns. The actual coded data can also be reduced because data related to non-U.S. citizens can be indexed to a single entry in the new dictionary. Therefore, by pushing down predicates into the transform encoder, already encoded data can be filtered and new encodings provided, reducing the amount of data ultimately sent to the host. Dictionary mapping can represent the mapping from the original dictionary to the converted encoding dictionary.
可使用現場可程式化閘陣列(FPGA)來實作轉換編碼器(以及其他特徵),然而本發明概念的實施例可包括其他實作形式,包括例如特殊應用積體電路(ASIC)、圖形處理單元(GPU)或某一其他執行軟體的處理器。另外,儲存器內計算(in-storage compute,ISC)控制器可與FPGA分開,或者其亦可被實作為FPGA的一部分。 The transcoder (as well as other features) may be implemented using a Field Programmable Gate Array (FPGA), however embodiments of the inventive concepts may include other implementations including, for example, Application Specific Integrated Circuits (ASICs), graphics processing unit (GPU) or some other processor that executes the software. Additionally, an in-storage compute (ISC) controller can be separate from the FPGA, or it can be implemented as part of the FPGA.
鑒於欲被執行加速功能(例如過濾)的特定檔案,ISC控制器可使用檔案2區塊映射(File2Block Map)來識別檔案上儲存檔案資料的區塊以及其次序。ISC控制器可被實作為主機內的組件(與儲存裝置本身分開),或者可為作為儲存裝置一部分的控制器。可存取該些區塊,以提供可(經由輸入緩衝器)輸入至轉換編碼器中的輸入串流。 Given a specific file for which an acceleration function (such as filtering) is to be performed, the ISC controller can use a File2Block Map to identify the blocks on the file that store the file data and their order. The ISC controller may be implemented as a component within the host (separate from the storage device itself), or may be a controller that is part of the storage device. These blocks are accessed to provide an input stream that can be input into the transcoder (via the input buffer).
當檔案是以行式格式儲存時,資料單位可為行組塊,行
組塊本身可包括數個資料頁面。亦即,輸入緩衝器可自儲存裝置中的儲存模組接收行組塊,使得轉換編碼器可對所述行組塊進行操作。一般而言,每一行組塊可包括其自己的元資料,所述元資料可指定用於所述行組塊的編碼方案及/或欲應用於所述行組塊中的資料的字典。然而,並非所有的儲存格式均使用此種排列:例如,行式儲存格式可在檔案的單獨區域中儲存元資料(而不是在每一行組塊內):此種元資料可指定整個檔案所使用的編碼及字典。因此,當使用此種行式儲存格式來儲存檔案時,ISC控制器可自檔案的元資料區域(使用檔案2區塊映射進行定位)擷取編碼及字典,並將所述資訊提供至轉換編碼器,而非假設轉換編碼器將自行組塊接收任何所期望的資訊。(當然,當使用行式儲存格式時,行組塊中可能不存在字典頁面。)注意,雖然相同的編碼方案可適用於每一行組塊,然而所述編碼方案本身可為混合方案,即利用二或更多個不同的編碼方案並視情況在其之間進行切換。例如,混合編碼方案可將RLE編碼與位元打包組合。
When the file is stored in row format, the data unit can be row chunks, row
The chunk itself can contain several data pages. That is, the input buffer may receive the row chunks from the storage module in the storage device so that the transcoder can operate on the row chunks. In general, each row chunk may include its own metadata, which may specify the encoding scheme used for the row chunk and/or the dictionary to be applied to the data in the row chunk. However, not all storage formats use this arrangement: for example, row storage formats can store metadata in separate areas of the file (rather than within each row chunk): this metadata can specify the data used by the entire file. encoding and dictionary. Therefore, when storing a file using this row storage format, the ISC controller can retrieve the encoding and dictionary from the file's metadata area (located using the
除了確定字典及編碼方案之外,ISC控制器亦可提取欲應用於已編碼資料的述詞,且可將所述述詞下推至轉換編碼器。然後,轉換編碼器可以各種方式使用所有此種資訊。例如,關於與檔案一起使用的編碼的資訊可用於選擇欲與資料一起使用的轉換編碼規則,而字典及述詞可用於產生轉換編碼字典及字典映射。 In addition to determining the dictionary and encoding scheme, the ISC controller may also extract predicates to be applied to the encoded data and may push the predicates down to the conversion coder. All this information can then be used by the transcoder in various ways. For example, information about the encoding used with the file can be used to select transformation encoding rules to be used with the data, and dictionaries and predicates can be used to generate transformation encoding dictionaries and dictionary maps.
述詞評估器(Predicate Evaluator)可使用述詞來判斷字典中的什麼條目是被感興趣的及什麼條目是不被感興趣的,生成 儲存感興趣的值(以及可能表示「不理會」條目的值)的轉換編碼字典、以及將索引自原字典映射至轉換編碼字典的字典映射。 The Predicate Evaluator can use predicates to determine which entries in the dictionary are of interest and which entries are not of interest, generating A transformation encoding dictionary that stores the values of interest (and possibly values representing "ignore" entries), and a dictionary map that maps indices from the original dictionary to the transformation encoding dictionary.
若轉換編碼字典包括「不理會」值的條目,則此種操作是在技術上向字典添加條目(乃因原字典不包括此種值)。添加此種條目可帶來新的問題。向轉換編碼字典添加「不理會」條目通常是在轉換編碼字典中的第一個條目(索引0)處發生,旨在表示與述詞不匹配的值。然而,為「不理會」條目創建新的值可為昂貴的:所揭露系統可掃描並重新映射整個字典(乃因所有現有的索引均相差1)。添加「不理會」條目亦可能導致記憶體重新分配或導致位元寬度溢出(bit width overflow):例如,若給定數目個位元的每一可能值已被用作字典索引,則向字典添加「不理會」條目可將用於表示索引的位元數目增加1。若資料頁面使用字典的一部分,則資料頁面可具有更小的位元寬度,且向轉換編碼字典添加「不理會」條目意味著在資料頁面中可能不能使用一個有效值。例如,若位元寬度為1,則添加「不理會」條目可涉及較可使用單個位元表示的值更多的值,而若位元寬度為2,則可能存在「不理會」條目的空間,而不會使位元寬度溢出。 If the converted encoding dictionary includes an entry for a "don't care" value, then this operation is technically adding an entry to the dictionary (because the original dictionary does not include such a value). Adding such an entry can create new problems. Adding a "don't care" entry to a transformation encoding dictionary usually occurs at the first entry in the transformation encoding dictionary (index 0) and is intended to represent a value that does not match the predicate. However, creating new values for "don't care" entries can be expensive: the disclosed system can scan and remap the entire dictionary (since all existing indexes differ by 1). Adding "ignore" entries may also cause memory reallocation or cause a bit width overflow: for example, if every possible value of a given number of bits has been used as a dictionary index, adding to the dictionary The "ignore" entry increases the number of bits used to represent the index by one. If the data page uses part of the dictionary, the data page can have a smaller bit width, and adding a "don't care" entry to the conversion encoding dictionary means that a valid value may not be used in the data page. For example, if the bit width is 1, adding a "don't care" entry may involve more values than can be represented using a single bit, while if the bit width is 2, there may be room for a "don't care" entry , without overflowing the bit width.
此種問題的解決方案是判斷述詞下推是否將導致字典大小的任何減小。若字典將減少至少一個條目,則存在「不理會」條目的空間,而不用擔心位元寬度溢出。若字典將不會減少至少一個條目,則可將已編碼資料直接發送至ISC控制器/主機,而不執行轉換編碼,藉此避免轉換編碼可增加資料量的可能性。 The solution to this problem is to determine whether predicate pushdown will result in any reduction in dictionary size. If the dictionary would be reduced by at least one entry, there is room to "ignore" the entries without worrying about bit width overflow. If the dictionary will not be reduced by at least one entry, the encoded data can be sent directly to the ISC controller/host without performing transcoding, thereby avoiding the possibility that transcoding would increase the amount of data.
注意,轉換編碼器的輸出可(經由輸出緩衝器)被發送回至ISC控制器。此起到二個目的。首先,雖然將述詞下推至轉換編碼器中可產生已轉換編碼的資料,但仍可能存在欲對已轉換編碼的資料執行的操作。例如,若主機試圖對檔案中的美國公民的數目進行計數,則已轉換編碼的資料將識別該些公民,但不對其進行計數:所述操作可由ISC控制器作為加速功能來執行。第二,可將已轉換編碼的資料發送回至主機用於進一步操作。ISC控制器與主機通訊,且因此為將已轉換編碼的資料發送至主機提供了路徑。 Note that the output of the transcoder can be sent back to the ISC controller (via the output buffer). This serves two purposes. First, although pushing down predicates into a transcoder produces transcoded data, there may still be operations that are intended to be performed on the transcoded data. For example, if the host attempts to count the number of US citizens in the archive, the transcoded data will identify those citizens but not count them: this operation can be performed by the ISC controller as an acceleration function. Second, the transcoded data can be sent back to the host for further manipulation. The ISC controller communicates with the host and therefore provides a path for the transcoded data to be sent to the host.
圖1示出根據本發明概念實施例包括可支援已編碼資料的轉換編碼的固態驅動機(SSD)的系統。在圖1中,可作為主機電腦的機器105可包括處理器110、記憶體115及儲存裝置120。處理器110可為任何種類的處理器。雖然圖1示出單個處理器110,然而機器105可包括任何數目的處理器,各處理器中的每一者可為單核心或多核心處理器,且可以任何所期望的組合進行混合。
FIG. 1 illustrates a system including a solid-state drive (SSD) that can support transcoding of encoded data according to an embodiment of the present invention. In FIG. 1 , a
處理器110可耦合至記憶體115。記憶體115可為任何種類的記憶體,例如快閃記憶體、動態隨機存取記憶體(Dynamic Random Access Memory,DRAM)、靜態隨機存取記憶體(Static Random Access Memory,SRAM)、持久性隨機存取記憶體、鐵電隨機存取記憶體(Ferroelectric Random Access Memory,FRAM)、或非揮發性隨機存取記憶體(Non-Volatile Random Access
Memory,NVRAM)(例如磁阻式隨機存取記憶體(Magnetoresistive Random Access Memory,MRAM)等)。記憶體115亦可為不同記憶體類型的任何所期望組合,且可由記憶體控制器125管理。記憶體115可用於儲存可被稱為「短期」的資料:即,預期不會儲存很長時間段的資料。短期資料的實例可包括臨時檔案、由應用本端地使用的資料(其可能是自其他儲存位置複製而來)等。
處理器110及記憶體115亦可支援作業系統,各種應用可在所述作業系統下運行。該些應用可發佈欲自記憶體115或儲存裝置120讀取資料或者向記憶體115或儲存裝置120寫入資料的請求。儘管記憶體115可用於儲存可被稱為「短期」的資料,然而儲存裝置120可用於儲存被認為是「長期」的資料:即,預期儲存很長時間段的資料。可使用裝置驅動器130來存取儲存裝置120。儲存裝置120可為任何所期望的形式,例如硬碟驅動機、固態驅動機(SSD)以及任何其他所期望的形式。
The
圖2示出圖1所示機器的細節。在圖2中,通常,機器105可包括一或多個處理器110,所述一或多個處理器110可包括記憶體控制器125及時鐘205,時鐘205可用於協調機器的組件的操作。處理器110亦可耦合至記憶體115,作為實例,記憶體115可包括隨機存取記憶體(random access memory,RAM)、唯讀記憶體(read-only memory,ROM)、或其他狀態保持媒體(state preserving media)。處理器110亦可耦合至儲存裝置120及網路連接器210,網路連接器210可例如為乙太網路連接器或無線連接
器。處理器110亦可連接至匯流排215,匯流排215可附接有使用者介面220、及可使用輸入/輸出引擎225來管理的輸入/輸出介面埠、以及其他組件。
Figure 2 shows details of the machine shown in Figure 1. In Figure 2, generally,
圖3示出圖1所示儲存裝置120與圖1所示處理器110使用不同的方法傳達相同的資料。在一種方法(傳統方法)中,可自儲存裝置120內的儲存器305(例如,其可為硬碟驅動機上的盤片(platter)或快閃記憶體儲存裝置(例如,SSD)中的快閃記憶體晶片)讀取資料,並將資料直接發送至處理器110。若儲存於儲存裝置120上的總資料(已編碼及/或已壓縮)是X個位元組,則此即為欲發送至處理器110的資料量。請注意,此分析考量了儲存量是用於儲存已編碼及/或已壓縮的資料:未編碼及未壓縮的資料可能將是更大數目的位元組(否則對資料進行編碼及/或壓縮可能不存在益處)。因此,例如,若資料在未編碼及未壓縮時可使用約10GB的儲存,但在被編碼及/或被壓縮時可使用約5GB的儲存,則可將約5GB的資料而非約10GB的資料自儲存裝置120傳送至處理器110。
FIG. 3 shows that the
亦可在被提供用於傳送資料的頻寬(以及因此用於達成傳送的時間)方面考量自儲存裝置120至處理器110的資料傳送。若儲存於儲存裝置120上的資料被編碼及/或被壓縮,則當儲存於儲存裝置120上的資料可被直接發送至處理器110(藉由箭頭310示出)時,可以B位元組/秒的有效速率發送儲存於儲存裝置120上的總資料。繼續較早的實例,考量其中儲存裝置120與處理器
110之間的連接包括約1GB/秒的頻寬的情形。由於已編碼及/或已壓縮的資料可佔用約5GB的空間,因此可在約1GB/秒的連接上於總共5秒內發送已編碼及/或已壓縮的資料。然而由於所儲存的總資料(在編碼及/或壓縮之前)為約10GB,因此資料的有效傳輸速率B為約2GB/秒(乃因在約5秒內發送約10GB的未編碼及未壓縮資料)。
The transfer of data from
對比之下,若使用儲存器內處理器315來預處理資料以試圖減少發送至處理器110的資料量,則可發送較少的原始資料(乃因儲存器內處理器315可對發送什麼資料更具選擇性)。另一方面,儲存器內處理器315可對資料進行解壓縮以處理所述資料(且亦可能對資料進行解碼)。因此,欲自儲存器內處理器315發送至處理器110的資料量可藉由資料選擇而減少,然而亦可能被增加壓縮(以及可能地編碼)量:在代數上,欲自儲存器內處理器315發送至處理器110(藉由箭頭320示出)的資料量可表達為X * Y * Z十億位元組(Gbyte),其中X是用於儲存已編碼及/或已壓縮資料的空間量,Y是壓縮比率(使用壓縮(以及可能地編碼)減少了多少資料儲存),且Z是選擇率(自未壓縮資料選擇了多少資料)。類似地,可將資料自儲存器內處理器315發送至處理器110的有效速率變為B * Y * Z位元組/秒。
In contrast, if the in-
對二個公式的簡單比較表明,當X * Y * Z<X(或B * Y * Z<B)時,即當Y * Z<1時,使用儲存器內處理器315來選擇欲發送至處理器110的資料是優越的。否則,即使儲存器內處理器
315應用其選擇性,在由儲存器內處理器315進行預處理之後欲發送的資料量亦大於已編碼及/或已壓縮資料的量:只發送原已編碼及/或已壓縮資料將比使儲存器內處理器315嘗試選擇欲發送至處理器110的資料更高效。
A simple comparison of the two formulas shows that when X * Y * Z < X (or B * Y * Z < B), that is, when Y * Z < 1, the in-
圖4示出根據本發明概念實施例,圖1所示儲存裝置120與圖1所示處理器110傳達已轉換編碼的資料。在圖4中,已編碼及/或已壓縮資料儲存於儲存器305中(再次,儲存器305可代表硬碟驅動機中的盤片、快閃記憶體儲存裝置(例如SSD)中的快閃記憶體晶片、或某一其他實體資料儲存器)。此資料—已壓縮資料405—可被傳遞至解壓縮器410,解壓縮器410可對所述資料進行解壓縮,進而產生已解壓縮資料415。可使用硬體解壓縮或藉由在適當電路(例如通用處理器、現場可程式化閘陣列(FPGA)、特殊應用積體電路(ASIC)、圖形處理單元(GPU)或通用GPU(General Purpose GPU,GPU))上運行的軟體來實作解壓縮器410(亦被稱為解壓縮引擎)。已解壓縮資料415仍可被編碼,乃因編碼及壓縮可為分開的過程。已解壓縮資料415然後可被傳遞至轉換編碼器420,轉換編碼器420可對資料執行轉換編碼。轉換編碼可被認為是將資料自一種編碼轉換成另一種編碼的過程。
FIG. 4 shows the
所有上述過程均可在儲存裝置120內發生。然而一旦轉換編碼器420已處理了已解壓縮資料415並產生了已轉換編碼的資料425,已轉換編碼的資料425便可被提供至主機電腦105。然後,解碼器430可對已轉換編碼的資料425進行解碼,藉此產生
已過濾的普通資料435。然後,已過濾的普通資料435可被提供至處理器110,處理器110然後可對已過濾的普通資料435執行任何所期望的操作。
All of the above processes can occur within
注意,對於解碼器430而言,對已轉換編碼的資料425進行解碼可涉及知曉關於對已轉換編碼的資料425應用的編碼的資訊。此資訊可例如包括在已轉換編碼的資料425中使用的特定編碼方案或者在已轉換編碼的資料425中使用的字典。雖然圖4未示出此資訊自儲存裝置120被傳遞至主機電腦105,然而此資訊可與已轉換編碼的資料425並行地(或者作為已轉換編碼的資料425的一部分)被傳遞至主機電腦105。當然,若已轉換編碼的資料425實際上未被編碼及未被壓縮(若轉換編碼器420的結果將使得發送較發送未編碼及未壓縮資料更大數目的實際位元組,則可能發生此種情況),則已轉換編碼的資料425可省略任何關於編碼方案或字典的資訊。
Note that, for the
此時,可值得對編碼與壓縮之間的差異進行論述。此二個概念是相關的—均涉及試圖減少用於儲存資料的儲存量—但存在一些差異。編碼通常涉及使用為資料提供索引的字典,所述資料若被直接包含將為冗長的且具有相對低數目的相異值。例如,存在大約195個不同的國家。若資料儲存關於大量人的公民身份的資訊,則直接包含每一個人的國籍將會使用大量的空間:至少幾個位元組(假設國家的名稱中每字元一個位元組)。另一方面,值1至195均可使用單個位元組來表示。使用字典來表示國家 (Country)的名稱並在資料中儲存適當國家名稱的索引可顯著減少欲儲存的資料量,而不會丟失任何資訊。例如,資訊「United States of America,United States of America,Korea,Korea,Korea,Korea,China,India,China,China,China,China,China,United States of America」可由表1所示的字典來代替地表示,進而使得所述資訊被表示為「3,3,2,2,2,2,0,1,0,0,0,0,0,3」:自153個字元減少至40個字元。即使將字典的52個字元考量在內,簡單地使用字典亦得到顯著的節省。 At this point, it's worth discussing the differences between encoding and compression. The two concepts are related—both involve trying to reduce the amount of storage used to store data—but there are some differences. Encoding typically involves the use of a dictionary that provides an index for material that, if included directly, would be lengthy and have a relatively low number of distinct values. For example, approximately 195 different countries exist. If the data stores information about the citizenship of a large number of people, directly including each person's nationality would use a lot of space: at least a few bytes (assuming one byte per character in the name of the country). On the other hand, values 1 to 195 can be represented using a single byte. Use a dictionary to represent countries (Country) and storing an index with the appropriate country name in the data can significantly reduce the amount of data to be stored without losing any information. For example, the information "United States of America, United States of America, Korea, Korea, Korea, Korea, China, India, China, China, China, China, China, United States of America" can be replaced by the dictionary shown in Table 1 representation, thereby causing the information to be represented as "3,3,2,2,2,2,0,1,0,0,0,0,0,3": reduced from 153 characters to 40 characters characters. Even taking into account the dictionary's 52 characters, there are significant savings from simply using a dictionary.
字典的價值可隨著字典的值數目變大而減小。例如,若存在1,000,000個不同的可能值,則每一索引可使用20個位元來儲存所述索引。當然,此可仍少於用於直接儲存值的位元數目,但編碼(相對於儲存未編碼的資料)的益處可減少。並且,若為資料中的每一條目儲存的值可為唯一的,或者若用於儲存索引的空間量與用於儲存值的空間量近似相同,則使用字典編碼實際上可增加欲儲存的資料量。繼續關於人的資料的實例,使用字典儲存他們的年齡並不較直接儲存年齡更高效。 The value of a dictionary can decrease as the number of values in the dictionary becomes larger. For example, if there are 1,000,000 different possible values, each index may use 20 bits to store the index. Of course, this may still be less than the number of bits used to store the value directly, but the benefits of encoding (versus storing unencoded data) may be reduced. Also, if the value stored for each entry in the data can be unique, or if the amount of space used to store the index is approximately the same as the amount of space used to store the value, then using dictionary encoding can actually increase the amount of data to be stored. quantity. Continuing with the example of data about people, using a dictionary to store their age is no more efficient than storing the age directly.
另一方面,壓縮通常使用例如霍夫曼碼(Huffman code)等的編碼方案。可分析資料以確定每一資料的相對頻率,其中較短的碼被指派給較頻繁的資料,且較長的碼被指派給不太頻繁的資料。摩斯碼(Morse code)雖然不是霍夫曼碼,但其是對較頻繁的資料使用較短序列且對不太頻繁的資料使用較長序列的碼的眾所習知的實例。例如,字母「E」可由序列「點」(後跟空格)表示,而字母「J」可由序列「點劃線劃線劃線」(後跟空格)表示。(由於摩斯碼使用空格來表示一個符碼結束且另一符碼開始之處,一個符碼的序列可為另一符碼的序列的前綴(注意「E」由點表示,而「J」以點開始但包括其他符碼)。摩斯碼並非是恰當的霍夫曼碼。但許多人在某種程度上對摩斯碼熟悉,此使得其成為對較頻繁的資料使用較短符碼且對不太頻繁的資料使用較長符碼的碼的更普遍有用的實例。) On the other hand, compression typically uses a coding scheme such as Huffman code. The data can be analyzed to determine the relative frequency of each data, with shorter codes assigned to more frequent data and longer codes assigned to less frequent data. Morse code, although not a Huffman code, is a well-known example of a code that uses shorter sequences for more frequent data and longer sequences for less frequent data. For example, the letter "E" can be represented by the sequence "dot" (followed by a space), while the letter "J" can be represented by the sequence "dash-dash-dash-dash-dash" (followed by a space). (Since Morse code uses spaces to indicate where one symbol ends and another begins, a sequence of one symbol can be the prefix of a sequence of another symbol (note that "E" is represented by a dot, and "J" Starting with a dot but including other symbols). Morse code is not a proper Huffman code. However, many people are familiar with Morse code to some extent, which makes it a shorter code for more frequent data. and a more generally useful example of using longer codes for less frequent data.)
返回至編碼方案,一旦建立了字典,便存在可用於進一步對資料進行編碼的數個不同的編碼方案。此類編碼方案的實例包括運行長度編碼(RLE)、位元打包、前綴編碼、叢集編碼、稀疏編碼及間接編碼:本發明概念的實施例亦可使用其他編碼方案。此處會論述運行長度編碼及位元打包,乃因其稍後在各種實例中使用;可輕易找到關於其他編碼方案的資訊。 Returning to the coding scheme, once the dictionary is established, there are several different coding schemes that can be used to further code the material. Examples of such encoding schemes include run length encoding (RLE), bit packing, prefix encoding, cluster encoding, sparse encoding, and indirect encoding: other encoding schemes may also be used with embodiments of the inventive concept. Run-length encoding and bit packing are discussed here because they are used in various examples later; information on other encoding schemes can be easily found.
運行長度編碼(RLE)依賴於如下前提:值經常以群組的形式出現。並非單獨地儲存每一值,而是可將值的單個複本與表示所述值在資料中出現的頻率的數字一起儲存。例如,若值「2」 在一列中出現四次,則並非儲存值「2」四次(此可使用四個位元組的儲存),而是可將值「2」與所述值的出現次數(「4」)一起儲存(此可使用二個位元組的儲存)。因此,繼續上述實例,序列「3,3,2,2,2,2,0,1,0,0,0,0,0,3」可由「[2,RLE],3,[4,RLE]2,[1,RLE]0,[1,RLE],1,[5,RLE],0,[1,RLE],3」來表示。編碼「[2,RLE],3」可被理解為意指存在使用RLE被編碼的資訊:值是「3」,且所述值重複二次;其他RLE編碼是類似的。(所述表示包含指示使用了RLE編碼的指示符的原因與對以下參照圖7所論述的混合編碼方案的潛在使用有關。)此序列可使用總共12個位元組:對於每一編碼,一個位元組儲存下一個值重複的次數,且一個位元組儲存欲重複的值。 Run-length encoding (RLE) relies on the premise that values often appear in groups. Rather than storing each value individually, a single copy of the value can be stored along with a number that represents how often that value occurs in the data. For example, if the value "2" appears four times in a column, instead of storing the value "2" four times (which would use four bytes of storage), the value "2" can be stored together with the number of occurrences of the value ("4") Storage (this can use two bytes of storage). Therefore, continuing the above example, the sequence "3,3,2,2,2,2,0,1,0,0,0,0,0,3" can be represented by "[2,RLE],3,[4,RLE ]2,[1,RLE]0,[1,RLE],1,[5,RLE],0,[1,RLE],3". The encoding "[2,RLE],3" can be understood to mean that there is information encoded using RLE: the value is "3", and the value is repeated twice; other RLE encodings are similar. (The reason why the representation includes an indicator that RLE encoding was used has to do with the potential use of hybrid encoding schemes discussed below with reference to Figure 7.) This sequence can use a total of 12 bytes: for each encoding, one A byte stores the number of times the next value is repeated, and a byte stores the value to be repeated.
與儲存原序列的14個位元組相較,12個位元組在儲存資料的空間量上並未大幅減少。但按比例而言,此種編碼表示此資料所需的儲存量減少約14%。甚至佔用約5GB的資料所使用的儲存減少約14%亦為顯著的節省:可節省約700百萬位元組(MB)。 Compared with the 14 bytes used to store the original sequence, 12 bytes does not significantly reduce the amount of space used to store data. But proportionally, this encoding represents about 14% less storage required for this data. Even a 14% reduction in storage usage for data occupying approximately 5GB is a significant savings: approximately 700 million bytes (MB).
作為每一值出現次數的替代方案,可儲存每一群組的起始位置。當使用起始位置來代替每一值出現次數的計數時,資料可被表示為「[0,RLE],3,[2,RLE]2,[6,RLE]0,[7,RLE],1,[8,RLE],0,[13,RLE],3」。 As an alternative to the number of occurrences of each value, the starting position of each group can be stored. When using the starting position instead of counting the occurrences of each value, the data can be represented as "[0,RLE],3,[2,RLE]2,[6,RLE]0,[7,RLE], 1,[8,RLE],0,[13,RLE],3".
以上論述闡述了其中使用RLE重複的值裝配至單個位元組中的情況。若不是此種情形呢:例如,若被重複的值是「1000」(「1000」可使用10個位元來儲存)呢?在此種情形中,RLE可 將所述值序列化為由七個位元形成的群組。每一位元組中可為所述位元組中的最高有效位元的第八個位元可表示所述位元組是否在另一位元組中延續。 The above discussion describes the case where RLE repeated values are assembled into a single byte. What if this is not the case: for example, what if the repeated value is "1000" ("1000" can be stored using 10 bits)? In this case, RLE can The value is serialized into groups of seven bits. The eighth bit in each byte, which may be the most significant bit in the byte, may indicate whether the byte continues in another byte.
作為實例,考量值「1000」。在二進制中,值「1000」可被表示為「11 1110 1000」。由於此表示使用10個位元,因此所述值可能太大而無法儲存於單個位元組中。因此,所述值可分解成由七個位元形成的群組(添加前導零(leading zero),以使得每一群組包含七個位元):「0000111 1101000」。現在,可對序列中的第一個位元組前置「1」以指示其所表示的值在下一個位元組中延續,且可對序列中的第二個位元組前置「0」以指示所述值以所述位元組結束。因此,位元序列變成「10000111 01101000」。當系統讀取此位元序列時,系統可知曉查看每一位元組中的最高有效位元,以判斷所述值是延續超出所述位元組還是以所述位元組結束,並在將所述位元序列重新組合成值時移除所述位元。因此,「10000111 01101000」變成「0000001111101000」(添加了二個附加前導零,以使表示達到完整的二個位元組),進而允許恢復原值「1000」。 As an example, consider the value "1000". In binary, the value "1000" can be represented as "11 1110 1000". Since this representation uses 10 bits, the value may be too large to be stored in a single byte. Therefore, the value can be broken down into groups of seven bits (leading zeros are added so that each group contains seven bits): "0000111 1101000". The first byte in the sequence can now be prepended with a '1' to indicate that the value it represents continues in the next byte, and the second byte in the sequence can be prepended with a '0' to indicate that the value ends with the byte. Therefore, the bit sequence becomes "10000111 01101000". When the system reads this sequence of bits, the system knows to look at the most significant bit in each byte to determine whether the value continues beyond the byte or ends with the byte, and The bits are removed when the sequence of bits is reassembled into a value. Therefore, "10000111 01101000" becomes "0000001111101000" (two additional leading zeros are added to bring the representation to a full two bytes), allowing the original value "1000" to be restored.
當然,若每一位元組中的一個位元用於識別所述位元組是否是值的延續,則所述位元可不用作所述值的一部分。因此,即使值可裝配至單個位元組中,亦可包括指示所述值不在另一位元組中延續的附加位元。另外,若所述值將裝配於8個位元中而非7個位元中(例如,128與255之間的值),則當使用指示值是 否在下一個位元組中延續的位元時,可使用二個位元組來表示完整的值(乃因所述值的最高有效位元將在編碼中移位至下一個由7個位元形成的群組)。 Of course, if one bit in each byte is used to identify whether the byte is a continuation of a value, then that bit may not be used as part of the value. Therefore, even though a value may fit into a single byte, additional bits may be included to indicate that the value does not continue in another byte. Additionally, if the value is to be assembled in 8 bits instead of 7 bits (for example, a value between 128 and 255), then when using the indicated value it is Two bytes can be used to represent the complete value if the byte is continued in the next byte (because the most significant bit of the value will be shifted to the next 7 bits in the encoding group formed).
當使用RLE時,可以任何所期望的次序呈現位元及/或位元組。例如,可自最高有效位元至最低有效位元或者自最低有效位元至最高有效位元來呈現位元;可以此二種方式中的任一種類似地對位元組進行排序。因此,例如,在位元組是自最低有效位元組至最高有效位元組呈現但每一位元組中的位元是自最高有效位元至最低有效位元呈現的情況下,使用延續位元(continuation bit),值「16384」可被編碼為「10000000 10000000 00000001」。可如下解譯此位元序列:每一位元組中的第一個位元是延續位元(其中「1」指示下一個位元組使所述值延續,且「0」指示所述值不在下一個位元組中延續)。在移除延續位元之後,剩餘的是「0000000 0000000 0000001」。當位元組是自最高有效位元組至最低有效位元組被重新排序(並被重構為傳統的由八個位元形成的群組,進而丟棄前導零)時,所述值變為「01000000 00000000」,此為值「16384」的二進制值。 When using RLE, bits and/or groups of bytes can be presented in any desired order. For example, the bits may be presented from most significant bit to least significant bit or from least significant bit to most significant bit; the bytes may be similarly ordered in either of these ways. So, for example, use a continuation when bytes are presented from least significant byte to most significant byte but the bits in each byte are presented from most significant byte to least significant byte. bit (continuation bit), the value "16384" can be encoded as "10000000 10000000 00000001". This bit sequence can be interpreted as follows: the first bit in each byte is a continuation bit (where a "1" indicates that the next byte continues the value, and a "0" indicates that the value not continued in the next byte). After removing the continuation bits, what remains is "0000000 0000000 0000001". When the bytes are reordered from most significant byte to least significant byte (and reconstructed into traditional groups of eight bits, thus discarding leading zeros), the value becomes "01000000 00000000", this is the binary value of the value "16384".
另一方面,位元打包利用了值可使用較整個位元組少的位元的理念。例如,若欲儲存的值包括0、1、2及3,則可使用二個位元來表示每一值。雖然可使用完整的位元組來儲存每一值,然而使用完整的位元組意味著75%的儲存實際上未被使用。位元打包藉由在單個位元組(或位元組序列)中儲存多於一個值來利 用此種現象。當值序列而非單個值重複時,位元打包特別有利。 Bit packing, on the other hand, exploits the idea that a value can use fewer bits than an entire byte. For example, if the values to be stored include 0, 1, 2, and 3, two bits can be used to represent each value. Although a complete byte can be used to store each value, using a complete byte means that 75% of the storage is actually unused. Bit packing facilitates the storage of more than one value in a single byte (or sequence of bytes). Use this phenomenon. Bit packing is particularly beneficial when sequences of values are repeated rather than individual values.
作為實例,考量序列「0,1,0,1,0,1,0,1」,並考量其中使用約四個位元來唯一地識別每一值(即,不使用大於15的值)的情況。並非單獨地儲存每一值(需要總共八個位元組),而是可使用編碼「[4,BP]0,1」。此種編碼表示單個位元組儲存四個表示值「0」的位元及四個表示值「1」的位元,然後所述位元組重複四次。(與RLE編碼一樣,位元打包編碼可包括指示資料是使用位元打包被編碼的指示符,以供在混合編碼方案中使用。)第一個位元組表示群組中資料欲重複的次數;第二個位元組儲存群組本身中的值。此種編碼可使用約二個位元組來儲存資料,進而使得用於所述序列的儲存量減少約75%。 As an example, consider the sequence "0,1,0,1,0,1,0,1" and consider that it uses about four bits to uniquely identify each value (i.e., no values greater than 15 are used) condition. Rather than storing each value individually (requiring a total of eight bytes), the encoding "[4,BP]0,1" can be used. This encoding means that a single byte stores four bits representing the value "0" and four bits representing the value "1", and then the byte is repeated four times. (Like RLE encoding, bit-packed encoding may include an indicator that the data is encoded using bit-packing for use in hybrid encoding schemes.) The first byte indicates the number of times the data in the group is to be repeated. ;The second byte stores the value in the group itself. This encoding can use about two bytes to store data, thus reducing the amount of storage used for the sequence by about 75%.
當使用位元打包時,可以任何所期望的方式對資料進行打包。例如,當對其中每一值使用四個位元的序列「0,1」進行打包時,所述序列可被表示為「00010000」(自最低有效位元至最高有效位元對值進行打包)或「00000001」(自最高有效位元至最低有效位元對值進行打包)。位元打包的一些實施方案可使用二種策略中的任一種,但然後反轉位元被放置至串流中的次序(以使得實際上是最低有效位元的部分首先通過)。在位元打包中亦可使用其他技術對位元進行打包。 When using bit packing, data can be packed in any desired way. For example, when each value is packed using a sequence of four bits "0,1", the sequence may be represented as "00010000" (values are packed from least significant bit to most significant bit) or "00000001" (the value is packed from the most significant bit to the least significant bit). Some implementations of bit packing may use either strategy, but then reverse the order in which the bits are placed into the stream (so that what is actually the least significant bit gets passed first). Other techniques can also be used to pack bits in bit packing.
當然,位元打包並不限制可裝配至單個位元組中的群組。與RLE一樣,位元打包中的值可使用位元來識別值是否在下一個位元組中延續。 Of course, bit packing does not limit the groups that can fit into a single byte. Like RLE, values in bit packing can use bits to identify whether the value continues in the next byte.
由於編碼及壓縮均試圖減少用於儲存資料表示的空間量,因此其益處可並非是倍增的。編碼及壓縮均試圖減少用於儲存資料的空間量。然而一旦資料已以一種方式(例如編碼)被緊縮(compact),應用進一步的緊縮方案(例如壓縮)可能並不那麼有用了。壓縮可在資料被編碼之後應用於資料,且仍可在某種程度上減少所使用的儲存空間量,但壓縮對已編碼資料的影響可小於壓縮對未編碼資料的益處。(若對資料進行緊縮的每一種方案均可以同等的益處被應用,而不管被緊縮的資料如何,則人們可能希望藉由應用重複的緊縮方案而簡單地將任何資料縮減至小得可笑的大小。稍加思考即應易於明白,此種結果在真實世界中是不現實的。) Since both encoding and compression attempt to reduce the amount of space used to store data representations, the benefits are not multiplied. Encoding and compression both attempt to reduce the amount of space used to store data. However, once the data has been compacted in one way (such as encoding), applying further compression schemes (such as compression) may not be so useful. Compression can be applied to the data after it has been encoded and still reduce the amount of storage space used to some extent, but the impact of compression on encoded data may be less than the benefit of compression on unencoded data. (If every scheme for compressing data could be applied with equal benefit, regardless of the data being compressed, then one might wish to reduce any data to a ridiculously small size simply by applying repeated compression schemes. . It should be easy to understand with a little thought that this result is unrealistic in the real world.)
圖5示出圖1所示儲存裝置120的細節。在圖5中,儲存裝置120被示為SSD,然而本發明概念的實施例可支援被加以適當修改的其他形式的儲存裝置120。在圖5中,儲存裝置120可包括主機介面層(host interface layer,HIL)505、SSD控制器510及各種快閃記憶體晶片515-1至515-8(亦被稱為「快閃記憶體儲存器」),快閃記憶體晶片515-1至515-8可被組織成各種通道520-1至520-4。主機介面層505可管理儲存裝置120與圖1所示機器105之間的通訊。該些通訊可包括欲自儲存裝置120讀取資料的讀取請求及欲將資料寫入至儲存裝置120的寫入請求。SSD控制器510可使用快閃記憶體控制器(圖5中未示出)來管理對快閃記憶體晶片515-1至515-8的讀取操作及寫入操作以及垃圾收
集(garbage collection)及其他操作。
FIG. 5 shows details of the
SSD控制器510可包括轉譯層525(亦被稱為快閃轉譯層(flash translation layer,FTL))。轉譯層525可執行將由圖1所示機器105提供的邏輯區塊位址(logical block address,LBA)轉譯成SSD 120上實際儲存資料的實體區塊位址(physical block addresses,PBA)的功能。如此一來,圖1所示機器105可使用其自己的位址空間來引用資料,而不必知曉儲存裝置120上實際儲存資料的實體位址。當例如資料被更新時,此可為有益的:由於儲存裝置120可能不就地更新資料,因此儲存裝置120可使現有資料失效並將更新寫入至儲存裝置120上新的PBA。或者,若資料儲存於被選擇用於垃圾收集的區塊中,則可在所述區塊被抹除之前將資料寫入至儲存裝置120上新的區塊。藉由更新轉譯層525,圖1所示機器105在資料被移動至不同的PBA時被隔離而不受何處實際儲存資料的影響。
The
SSD控制器510亦可包括檔案至區塊映射(file to block map)530。檔案至區塊映射530可指定哪些區塊用於儲存哪些檔案的資料。例如,當資料以行式格式儲存時,可使用檔案至區塊映射530。檔案至區塊映射530可為轉譯層525的一部分(在此種情形中,檔案至區塊映射530可不被認為是儲存裝置120的單獨組件),或者檔案至區塊映射530可補充轉譯層525(例如,轉譯層525可用於使用相對較少數目的區塊的資料,而檔案至區塊映射可用於佔據相對較多數目的區塊的資料),或者檔案至區塊映射
530可完全替換轉譯層525(在此種情形中,轉譯層525可不存在於SSD控制器510中)。
SSD控制器510亦可包括轉換編碼器420。然而,本發明概念的實施例可包括使轉換編碼器420位於儲存裝置120內的別處(例如,可在儲存裝置120內的某處使用通用處理器(運行合適的軟體)、FPGA、ASIC、GPU或GPGPU以及其他可能方案實作轉換編碼器420)或者甚至在儲存裝置120外部的配置。
儲存裝置120亦可包括圖3所示儲存器內處理器315(圖5中未示出),儲存器內處理器315可執行管控如何使用儲存於儲存裝置120上的資料的指令。圖3所示儲存器內處理器315亦可用於儲存器內計算功能,以在儲存裝置120上本端地執行操作,而非在圖1所示處理器110上執行操作。像轉換編碼器420一樣,可在儲存裝置120內的某處使用通用處理器(運行合適的軟體)、FPGA、ASIC、GPU或GPGPU以及其他可能方案或者甚至在儲存裝置120外部來實作圖3所示儲存器內處理器315。
雖然圖5將儲存裝置120示為包括被組織成四個通道520-1至520-4的八個快閃記憶體晶片515-1至515-8,然而本發明概念的實施例可支援被組織成任何數目的通道的任何數目的快閃記憶體晶片。類似地,雖然圖5示出SSD控制器510可包括轉換編碼器420及/或圖3所示儲存器內處理器315,然而本發明概念的實施例可以除圖5所示之外的方式配置有轉換編碼器420或圖3所示儲存器內處理器315。
Although FIG. 5 shows
圖6示出圖4所示轉換編碼器420的細節。在圖6中,轉換編碼器420可接收各種輸入(例如輸入字典、輸入串流及編碼類型),且產生各種輸出(例如輸出字典及輸出串流)。簡言之,轉換編碼器420可運作以獲取可使用由編碼類型指定的編碼方案被編碼的輸入串流,且可產生輸出串流。(雖然輸入串流可被編碼,然而以下論述考量其中輸入串流未被壓縮的情況:若輸入串流被壓縮,則輸入串流可在進一步處理之前被解壓縮。)輸出串流可使用與輸入串流相同的編碼方案來編碼,或者輸出串流可使用不同的編碼方案來編碼(或者二者兼具:如下所論述,當使用混合編碼方案時,一些資料可自一種編碼方案改變成另一種編碼方案)。
FIG. 6 shows details of
另外,即使編碼方案在輸入串流與輸出串流之間不變,編碼本身亦可改變。例如,若特定值在輸入字典中及輸出字典中被指派給不同的索引,則應在實際資料中使用的值中反映字典的改變。為此,轉換編碼器420亦可獲取輸入字典並將其映射至輸出字典。
Additionally, even if the encoding scheme does not change between the input stream and the output stream, the encoding itself can change. For example, if a particular value is assigned to a different index in the input dictionary than in the output dictionary, the dictionary changes should be reflected in the values used in the actual data. To this end, transform
作為此最後二點的實例,再次考量以上表1所示的字典。現在考量其中圖1所示主機電腦105對關於美利堅合眾國(United
States of America)公民的資料感興趣的情況。表1可被視為輸入字典,乃因其表示在輸入串流中接收的資料。另一方面,表2可為輸出字典,表示輸出串流中的資料。關於表2,至少存在三點需注意。第一,與表1所示的四個條目相較,表2包括二個條目。第二,表2包括被標記為「Don’t Care(不理會)」的條目(但可使用任何其他名稱,乃因圖1所示主機電腦105此時對由對應值表示的資料不感興趣)。第三,儘管「United States of America(美利堅合眾國)」在表1中具有ID 3,然而「United States of America(美利堅合眾國)」在表2中具有ID 1。此最後一點暗示,在輸入串流中對ID 3的任何引用可在輸出串流中改變成引用ID 1(否則資料可能毫無意義)。
As an example of these last two points, consider again the dictionary shown in Table 1 above. Now consider one of the 105 pairs of host computers shown in Figure 1 regarding the United States of America (United States).
States of America) citizens' information is of interest. Table 1 can be considered an input dictionary because it represents the data received in the input stream. On the other hand, Table 2 can be an output dictionary, representing the data in the output stream. Regarding Table 2, there are at least three points to note. First, Table 2 includes two entries compared to the four entries shown in Table 1. Second, Table 2 includes entries labeled "Don't Care" (but any other name could be used because the
為了完成該些操作,轉換編碼器420可包括各種組件。轉換編碼器420可包括循環緩衝器605、串流分離器610、索引映射器615、當前編碼緩衝器620、先前編碼緩衝器625、轉換編碼規則630及規則評估器635。
To accomplish these operations,
循環緩衝器605可接收來自圖1所示儲存裝置120內的圖3所示儲存器305的資料串流。由於欲處理的整個資料可為大的(例如,幾十億位元組(gigabyte)或幾兆位元組(terabyte)的資料),因此試圖一次載入所有資料並將其作為一個單位在某一儲存器內進行處理可為不切實際的。因此,輸入串流可作為串流被接收並被緩衝,進而允許以較整個資料集小的單位來處理資料。雖然圖6將緩衝器605示為循環緩衝器,然而本發明概念的實施
例可使用任何類型的緩衝器來儲存自輸入串流接收的資料。
The
串流分離器610可自循環緩衝器605獲取資料,並將所述資料劃分成組塊。然後,所述組塊可被傳遞至索引映射器615。組塊可表示欲由轉換編碼器420內的其他組件處理的資料單位,且不應與可在其他背景下使用的用語「組塊」相混淆(例如,可參照以下圖9使用的用語「行組塊(column chunk)」)。
圖7示出圖6所示串流分離器610將輸入已編碼資料劃分成組塊,所述輸入已編碼資料可為輸入串流的一部分(或全部)。在圖7中,輸入資料被示為除其他資料之外包括三條已編碼資料:「[1,BP],3,3、[4,RLE],2、[5,RLE],0」。如上所論述,該些組塊表示使用位元打包及RLE編碼方案被編碼的資料。此種編碼表示以下(未編碼的)值序列:「3,3,2,2,2,2,0,0,0,0,0」。對於每一單獨的編碼,圖1所示主機電腦105可能對所述資料(或所述資料的一部分)感興趣,或者圖1所示主機電腦105可能對所述資料不感興趣。圖1所示主機電腦105是否可對每一編碼中的值感興趣可取決於轉換編碼規則630:圖6所示串流分離器610可能不知曉圖1所示主機電腦105可對什麼資料感興趣。因此,圖6所示串流分離器610可將輸入資料串流劃分成組塊,其中每一組塊包括不同的一條已編碼資料。因此,組塊705-1可包括編碼「[1,BP],3,3」,組塊705-2可包括編碼「[4,RLE],2」,且組塊705-3可包括編碼「[5,RLE],0」。
FIG. 7 shows that the
關於圖7,至少存在附加的二點值得注意。第一,亦注意,
在圖7所示的示例性輸入串流中,一些資料是使用位元打包被編碼,且一些資料是使用RLE被編碼。若所有資料是使用單個編碼方案(例如,RLE)被編碼,則圖6所示串流分離器610可自輸入至圖6所示轉換編碼器420的編碼類型確定所述事實。然而有時使用混合編碼方案。在混合編碼方案中,可使用一種編碼方案(例如RLE)對一些資料進行編碼,且可使用另一種編碼方案(例如位元打包)對一些資料進行編碼(所述概念亦可推廣至在混合編碼方案中使用多於二種編碼方案)。在混合編碼方案中,圖6所示轉換編碼器420可不接收編碼類型作為輸入,乃因所述資訊獨自不會告訴圖6所示串流分離器610以哪種編碼方案編碼了什麼資料。而是,圖6所示串流分離器610可藉由查看每一組塊本身來判斷對所述組塊使用了什麼編碼方案。
There are at least two additional points worth noting regarding Figure 7. First, also note that
In the exemplary input stream shown in Figure 7, some data is encoded using bit packing, and some data is encoded using RLE. If all the data is encoded using a single encoding scheme (eg, RLE), then destreamer 610 of FIG. 6 can determine this fact from the encoding type input to transform
一種確定用於對特定組塊進行編碼的編碼方案的方式可為藉由查核組塊中特定位元的值。例如,行式儲存格式可使用第一位元組中的最低有效位元來指示特定資料組塊是可使用RLE還是位元打包來編碼:若所述位元的值可為「0」,則可使用RLE,且若所述位元的值可為「1」,則可使用位元打包。然後,可自位元組移除此位元,且使剩餘的位元在邏輯上向右移位一個位元,以產生編碼所使用的值。 One way to determine the encoding scheme used to encode a specific chunk may be by checking the value of specific bits in the chunk. For example, a row storage format can use the least significant bit in the first tuple to indicate whether a particular chunk of data can be encoded using RLE or bit packing: if the value of the bit can be "0", then RLE can be used, and bit packing can be used if the bit can have a value of '1'. This bit is then removed from the byte and the remaining bits are logically shifted right by one bit to produce the value used for encoding.
例如,考量組塊705-1。組塊705-1將包括位元序列「00000011 00110011」。當圖6所示串流分離器610讀取第一個位元組「00000011」時,圖6所示串流分離器610可查核最低有效
位元(最後一個「1」)。由於最低有效位元可為「1」,因此圖6所示串流分離器610可確定此組塊可使用位元打包被編碼。可移除此最低有效位元,且使第一個位元組中的剩餘位元在邏輯上向右移位一個位元,進而產生位元組「00000001」。由於此位元組的第一個(最高有效)位元可為「0」,因此圖6所示串流分離器610可確定所述位元組可僅為「00000001」(可移除指示值可不在下一個位元組中延續的「0」位元且添加另一前導零),進而指示群組(仍待確定)可重複一次。然後,圖6所示串流分離器610可讀取下一個位元組「00110011」。由於此位元組的最高有效位元可為「0」,因此圖6所示串流分離器610知曉此值不延續至下一個位元組中。可移除延續位元,並添加前導零,進而產生值「00110011」,其表示值「3」及「3」。因此,圖6所示串流分離器610可確定編碼使用位元打包來表示值「3」可重複二次。
For example, consider block 705-1. Block 705-1 will include the bit sequence "00000011 00110011". When the
另一方面,考量組塊705-2。組塊705-2將包括位元序列「00001000 00000010」。當圖6所示串流分離器610讀取第一個位元組「00001000」時,圖6所示串流分離器610可查核最低有效位元(最後一個「0」)。由於最低有效位元可為「0」,因此圖6所示串流分離器610可確定此組塊可使用RLE被編碼。可移除此最低有效位元,且使第一個位元組中的剩餘位元在邏輯上向右移位一個位元,進而產生位元組「00000100」。由於此位元組的第一個(最高有效)位元可為「0」,因此圖6所示串流分離器610可確定所述位元組可僅為「00000100」(可移除指示值可不在下一個位
元組中延續的「0」位元且添加另一前導零),進而指示值(仍待確定)可重複四次。然後,圖6所示串流分離器610可讀取下一個位元組「00000010」。由於此位元組的最高有效位元可為「0」,因此圖6所示串流分離器610知曉此值不延續至下一個位元組中。可移除延續位元,並添加前導零,進而產生值「00000010」。因此,圖6所示串流分離器610可確定編碼使用RLE來表示值「2」可重複四次。
On the other hand, consider block 705-2. Block 705-2 will include the bit sequence "00001000 00000010". When the
當然,圖6所示串流分離器610可能並不針對任一位元序列進行此種分析的全部。圖6所示串流分離器610可做的全部將是讀取位元組,直至其遇到具有「0」作為最高有效位元的位元組(此位元組序列將指示編碼方案及即將到來的值的重複次數),然後讀取位元組,直至其遇到具有「0」作為最高有效位元的另一位元組(此位元組序列將表示被編碼的值)。然後,圖6所示串流分離器610可將那些所讀取的位元(表示整個已編碼組塊)傳遞至圖6所示索引映射器615(且用於稍後由圖6所示規則評估器635處理):圖6所示索引映射器615(及/或圖6所示規則評估器635)可執行所闡述的分析,以判斷對所述組塊使用了什麼編碼方案以及什麼值被如此編碼。然而,若圖6所示串流分離器610(或者圖6所示索引映射器615或本發明概念的任何其他組件)執行分析以確定用於對特定資料組塊進行編碼的編碼方案,則圖6所示串流分離器610(或者圖6所示索引映射器615或其他組件)可將編碼類型沿線向下傳遞至其他組件,以避免重複此種分析。在
處理組塊時自組塊移除識別編碼方案的位元的情況下,此種操作可特別重要:沒有編碼類型,稍後處理已編碼資料的組件可能無法正確處理已編碼資料。
Of course, the
第二,注意組塊705-2及705-3表示均使用RLE被編碼的連續組塊。可預期,圖6所示串流分離器610會將所有連續的RLE編碼認為是單個組塊(基於使用不同的編碼方案來分隔開組塊)。但回想起,目標是對輸入串流進行轉換編碼,以便將不感興趣的所有資料合併成單個「不理會」值。回想起,圖6所示串流分離器610可能不具有關於圖1所示主機電腦105對什麼資料感興趣的資訊。倘若圖6所示串流分離器610認為使用同一編碼方案的所有編碼是同一組塊,則圖6所示串流分離器610可最終將圖1所示主機電腦105感興趣的資料與圖1所示主機電腦105不感興趣的資料混合。另外,若輸入串流中的所有資料是使用同一編碼方案被編碼,則整個輸入串流將被認為是單個組塊,此將消除圖6所示串流分離器610作為圖6所示轉換編碼器420的一部分的效用。
Second, note that chunks 705-2 and 705-3 represent consecutive chunks that are both encoded using RLE. It is expected that the
第三,儘管以上論述著重於使用一個位元來區分二種不同編碼方案的混合編碼方案,然而本發明概念的實施例可被推廣至使用多於二種相異編碼方案的混合編碼方案。當然,若使用多於二種編碼方案,則可使用多於一個位元來區分不同的編碼方案。例如,若使用三種或四種編碼方案,則可使用二個位元來區分編碼方案,若使用五種、六種、七種或八種不同的編碼方案, 則可使用三個位元來區分不同的編碼方案,等等。 Third, although the above discussion focuses on a hybrid coding scheme that uses one bit to distinguish two different coding schemes, embodiments of the inventive concept can be generalized to hybrid coding schemes that use more than two different coding schemes. Of course, if more than two encoding schemes are used, more than one bit can be used to distinguish the different encoding schemes. For example, if three or four encoding schemes are used, two bits can be used to distinguish the encoding schemes, and if five, six, seven, or eight different encoding schemes are used, Three bits can be used to distinguish between different encoding schemes, and so on.
(亦注意,用於區分編碼方案的位元亦可用於其他目的。例如,考量其中使用三種編碼方案的情況。若第一個位元組的最低有效位元具有特定值(例如「0」),則可使用一種編碼方案,例如RLE,其中下一個最低有效位元用於表示值。然而,若第一個位元組的最低有效位元具有另一特定值(例如「1」),則下一個最低有效位元可用於區分其餘二種編碼方案(例如位元打包及叢集編碼)。) (Also note that the bits used to distinguish encoding schemes can be used for other purposes. For example, consider the case where three encoding schemes are used. If the least significant bit of the first byte has a specific value (such as "0") , then an encoding scheme such as RLE can be used, in which the next least significant bit is used to represent the value. However, if the least significant bit of the first byte has another specific value (such as "1"), then The next least significant bit can be used to differentiate between the remaining two encoding schemes (such as bit-packing and cluster encoding).)
返回至圖6,索引映射器615可自串流分離器610接收組塊。然後,索引映射器615可將來自輸入字典的已編碼值映射至輸出字典中的已編碼值。例如,再次考量以上表1及表2中所示的字典。由於與「United States of America(美利堅合眾國)」對應的值可為被感興趣的,因此當在已編碼組塊中遇到值「3」時,可以值「1」來替換值「3」;當在已編碼組塊中遇到所有其他值時,可以值「0」來替換所有其他值。
Returning to FIG. 6 ,
圖8示出圖6所示索引映射器將輸入字典映射至輸出字典。在圖8中,示出索引映射器615接收輸入字典805並生成輸出字典810。鑒於關於圖1所示主機電腦105感興趣的資料的資訊,索引映射器615可生成輸出字典810。索引映射器亦可生成自輸入字典805至輸出字典810的映射。繼續上述實例,此映射可指定表3中所示的映射。可看出,索引「3」可映射至索引「1」;所有其他索引可映射至索引「0」。
Figure 8 shows the index mapper shown in Figure 6 mapping the input dictionary to the output dictionary. In Figure 8,
關於索引映射器615,存在幾點值得注意。第一,雖然索引映射器615被示為圖6所示轉換編碼器420的單獨組件,然而索引映射器615可連同圖6所示規則評估器635一起工作(或者被實作為規則評估器635的一部分)。第二,索引映射器615可如何生成輸出字典810(以及表3中所示的映射)可取決於圖1所示主機電腦105對什麼資料感興趣。以下參照圖11論述索引映射器可如何得知圖1所示主機電腦105對什麼資料感興趣。第三,對資料進行轉換編碼可涉及將輸入字典805映射至輸出字典810的索引映射器615及圖6所示轉換編碼規則630:圖6所示轉換編碼規則630可取決於自輸入字典805至輸出字典810的映射。反之則不然:自輸入字典805至輸出字典810的映射(以及因此索引映射器615的操作)可在不參照圖6所示轉換編碼規則630的情況下生成。
There are several points worth noting regarding
關於索引映射器615的第四點更微妙。注意,索引映射器615有效地向輸出字典810添加新的條目:「不理會」值。為使實作簡單,使索引映射器615始終對「不理會」值使用同一索引
是有意義的。由於輸入字典805的大小可依據資料集而變化,因此可始終使用索引「0」。
The fourth point about
然而,若結果是圖1所示主機電腦105對資料集中的所有資料均感興趣,會發生什麼?在此種情形中,索引映射器615已向輸出字典810添加條目,但輸出字典810中無條目被移除。此二個事實的組合意味著輸出字典810可較輸入字典805大(一個條目)。考量其中輸入字典805對於某一值n具有恰好2 n 個條目的情況。此事實意味著可使用n個位元來表示輸入字典805中的每一索引。將「不理會」條目添加至輸出字典810意味著現在在輸出字典810中存在2 n +1個條目,此意味著現在使用n+1個位元來表示資料集中所有可能的值:此問題被稱為位元溢出(bit overflow)。此附加位元可影響已編碼資料,進而需要添加新的位元來恰當地表示資料。因此,對輸出字典810的單個小改變可能對資料表示產生巨大的波紋效應,且可能使用於表示已編碼資料的儲存量顯著增加。
However, what happens if it turns out that the
雖然上述實例著重於其中引入「不理會」條目會添加新的位元來表示輸出字典810中所有可能的索引的情況,然而甚至當輸出字典810的大小增加至可使用新的位元來表示所有可能的索引的程度時,亦可發生類似的問題。再次考量表1所示的輸入字典,並考量其中圖1所示主機電腦105對中國及印度公民感興趣的情況(表1中的索引「0」及「1」)。可使用單個位元來表示該些索引(乃因可使用一個位元來表示值「0」及「1」)。若該些
值是使用位元打包被編碼,則可將八個此種值打包成單個位元組。然而,若在輸出字典810中索引「0」被指派給「不理會」值,則中國及印度的索引將映射至其他值(例如,「1」及「2」)。由於值「2」使用二個位元,因此可能無法再將八個值打包成單個位元組:已發生位元溢出。
Although the above example focuses on the case where introducing a "don't care" entry adds new bits to represent all possible indexes in the
對於位元溢出問題存在幾種可使用的解決方案。一種是進行檢查以查看輸入字典805中的任何索引是否表示圖1所示主機電腦105不感興趣的資料。若結果是主機電腦105對輸入字典805中的所有資料感興趣,則根本沒必要對輸入串流進行轉換編碼,且輸入串流可不加修改地直接映射至輸出串流。
There are several possible solutions to the bit overflow problem. One is to check to see if any index in the
然而此種解決方案雖然有用,但可能還不夠,乃因在位元打包中仍可能發生位元溢出問題。為了避免位元打包中的位元溢出,解決方案可為確保使用於表示輸出字典810中任何索引的位元數目不大於用於表示輸入字典805中任何索引的位元數目。此處闡述二種可能的解決方案。一種解決方案可為將輸出字典810中的最高可能索引指派給「不理會」值:亦即,首先將輸入字典805中所有感興趣的索引映射至輸出字典810,然後對「不理會」值使用最低未使用的索引。另一種解決方案可為識別輸入字典805中圖1所示主機電腦105不感興趣的索引,並使用所述索引作為「不理會」值。在此二種解決方案中,輸入字典805中的任何索引均不會被替換成輸出字典810中的更大索引,此可避免位元溢出問題。此類解決方案的缺點是可能無法為「不理會」選擇獨立
於輸入字典805的索引。
However, although this solution is useful, it may not be sufficient because bit overflow problems may still occur during bit packing. To avoid bit overflow in bit packing, a solution may be to ensure that the number of bits used to represent any index in the
再次返回至圖6,當前組塊(可能由索引映射器615處理)可儲存於當前編碼緩衝器620中。自那裡,規則評估器635可評估當前編碼緩衝器620中的已編碼資料以及先前編碼緩衝器625中的已編碼資料,且判斷是否應改變編碼以及應將什麼資料輸出至輸出串流。簡言之,規則評估器635可判斷當前編碼緩衝器620中的已編碼資料是否可與先前編碼緩衝器625中的已編碼資料組合。若是,則可將當前編碼緩衝器620中的已編碼資料添加至先前編碼緩衝器625中的已編碼資料;否則,可將先前編碼緩衝器625中的已編碼資料輸出至輸出串流,且可將當前編碼緩衝器620中的已編碼資料移動至先前編碼緩衝器625。(此種分析考量了其中先前編碼緩衝器625中存在資料的情況。若先前編碼緩衝器625不包含資料,例如對於第一資料組塊可能發生此種情形,則無需考慮嘗試將當前編碼緩衝器620中的已編碼資料與先前編碼緩衝器625中的已轉換編碼的資料組合。)
Returning to FIG. 6 again, the current chunk (possibly processed by index mapper 615) may be stored in
此引出下一個問題:已編碼資料何時可被組合?簡短的回答是,當已編碼資料組塊二者均表示圖1所示主機電腦105感興趣的資料或者均表示主機電腦105不感興趣的資料時,可組合所述組塊。一些實例可幫助說明規則評估器635如何運作。在二個實例中,輸入串流包括相同的資料:「[1,BP],3,3,[4,RLE],2,[1,BP],0,1,[5,RLE],1,[1,BP],3」,且輸入字典如表1所示。在二個實例中,列表示當前編碼緩衝器620及先前編碼緩衝器625中
的內容以及當時已被輸出至輸出串流的內容的「快照」。
This leads to the next question: When can the encoded data be combined? The short answer is that the encoded data chunks may be combined when either the chunks represent material of interest to the
在第一實例中,圖1所示主機電腦105已請求了關於美利堅合眾國公民的資料。在表1中可看出,「United States of America(美利堅合眾國)」的索引是「3」。因此,輸出字典可如表2所示。
In a first example,
如表4的第1列中所示,由規則評估器635處理的第一組塊是「[1,BP],3,3」。由於此組塊可包括感興趣的資料(值「3」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「3」替換成值「1」。然後,此已轉換編碼的組塊可被移動至先前編碼緩衝器625(如表4的第2列中所示)。
As shown in
在表4的第2列中,由規則評估器635處理的第二組塊是「[4,RLE],2」。由於此組塊可不包括感興趣的資料(值「2」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「2」替換成值「0」(指示此資料是「不理會」資料)。由於此組塊可包含「不理會」資料,但先前編碼緩衝器625可包含感興趣的資料,因此先前編碼緩衝器625中的資料可被輸出至輸出串流(如表4的第3列中所示),且當前已轉換編碼的組塊可被移動至先前編碼緩衝器625(如表4的第3列中所示)。
In
在表4的第3列中,由規則評估器635處理的第三組塊是「[1,BP],0,1」。由於此組塊可不包括感興趣的資料(值「0」及「1」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「0」及「1」替換成值「0」(指示此資料是「不理會」資料)。
In
由於此組塊可包含「不理會」資料,且先前編碼緩衝器625可能已包含「不理會」資料,因此此二個組塊可被組合。由於此組塊使用位元打包,但先前編碼緩衝器625中的組塊使用RLE,因此可將二種編碼方案中的一種替換成另一種編碼方案。在此實例中,可使用RLE對經位元打包編碼的資料進行轉換編碼。(若二或更多個值是使用位元打包儲存於單個值中,則可複製整個群組,此意味著已複製值的數目可為已打包值的數目的倍數。另一方面,RLE複製單個值。)因此,先前編碼緩衝器625現在可儲
存「[6,RLE],0」(如表4的第4列中所示),此將第二組塊中的四個「不理會」值與第三組塊中的二個「不理會」值組合。
Since this chunk may contain "don't care" data, and the
在表4的第4列中,由規則評估器635處理的第四組塊是「[5,RLE],1」。由於此組塊可不包括感興趣的資料(值「1」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「1」替換成值「0」(指示此資料是「不理會」資料)。
In
由於此組塊可包含「不理會」資料,且先前編碼緩衝器625可能已包含「不理會」資料,因此此二個組塊可被組合。此二個組塊均使用RLE作為編碼方案來對相同的「不理會」值進行編碼,因此規則評估器635可藉由增加先前編碼緩衝器625中的組塊中的複製值來組合所述二個組塊。因此,先前編碼緩衝器625現在可儲存「[11,RLE],0」(如表4的第5列中所示),其將第二組塊中的四個「不理會」值、第三組塊中的二個「不理會」值以及第四組塊中的五個「不理會」值組合。
Since this chunk may contain "don't care" data, and the
在表4的第5列中,由規則評估器635處理的第二組塊是「[1,BP],3」。由於此組塊可包括感興趣的資料(值「3」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「3」替換成值「1」。注意,由於此已轉換編碼的組塊可包含感興趣的資料,而先前編碼緩衝器625可包含「不理會」資料,因此可不將此已轉換編碼的組塊與先前編碼緩衝器625中的組塊組合。
In
此時,通常先前編碼緩衝器625中的已轉換編碼的資料將被輸出至輸出串流,且當前已轉換編碼的組塊將被移動至先前編碼緩衝器625。然而,由於當前已轉換編碼的組塊是輸入串流中的最後一個組塊,因此可輸出二個已轉換編碼的組塊(當然,先前編碼緩衝器625中的組塊首先輸出)。表4的第6列示出最終輸出。
At this time, typically the transcoded data in the
在第二實例中,圖1所示主機電腦105已請求關於韓國公民的資料。在表1中可看出,「Korea(韓國)」的索引是「2」。因此,輸出字典如表5所示。
In a second example, the
如表6的第1列中所示,由規則評估器635處理的第一組塊是「[1,BP],3,3」。由於此組塊可包括不感興趣的資料(值「3」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「3」替換成值「0」(指示此資料是「不理會」資料)。然後,此已轉換編碼的組塊可被移動至先前編碼緩衝器625(如表6的第2列中所示)。
As shown in
在表6的第2列中,由規則評估器635處理的第二組塊是「[4,RLE],2」。由於此組塊可包括感興趣的資料(值「2」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「2」替換成值「1」。由於此組塊可包含感興趣的資料,但先前編碼緩衝器625可包含不感興趣的資料,因此先前編碼緩衝器625中的資料可被輸出至輸出串流(如表6的第3列中所示),且當前已轉換編碼的組塊可被移動至先前編碼緩衝器625(如表6的第3列中所示)。
In
在表6的第3列中,由規則評估器635處理的第三組塊是「[1,BP],0,1」。由於此組塊可不包括感興趣的資料(值「0」及「1」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「0」及「1」替換成值「0」
(指示此資料是「不理會」資料)。由於此組塊可包含不感興趣的資料,但先前編碼緩衝器625可包含感興趣的資料,因此先前編碼緩衝器625中的資料可被輸出至輸出串流(如表6的第4列中所示),且當前已轉換編碼的組塊可被移動至先前編碼緩衝器625(如表6的第4列中所示)。
In
在表6的第4列中,由規則評估器635處理的第四組塊是「[5,RLE],1」。由於此組塊可不包括感興趣的資料(值「1」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來將值「1」替換成值「0」(指示此資料是「不理會」資料)。
In
由於此組塊可包含「不理會」資料,且先前編碼緩衝器625可包含「不理會」資料,因此此二個組塊可被組合。由於此組塊使用RLE,但先前編碼緩衝器625中的組塊使用位元打包,可將所述二種編碼方案中的一種替換成另一種編碼方案。在此實例中,可使用RLE對經位元打包編碼的資料進行轉換編碼。(再次,選擇RLE可能是由於可複製單個值,而非值群組。)因此,先前編碼緩衝器625現在儲存「[7,RLE],0」(如表6的第5列中所示),其將第三組塊中的二個「不理會」值與第四組塊中的五個「不理會」值組合。
Since this chunk may contain "don't care" data, and the
在表6的第5列中,由規則評估器635處理的第二組塊是「[1,BP],3」。由於此組塊可不包括感興趣的資料(值「3」),因此規則評估器635可使用自圖8所示輸入字典805至圖8所示
輸出字典810的映射來將值「3」替換成值「0」(「不理會」值)。
In
由於此組塊可包含「不理會」資料,且先前編碼緩衝器625可包含「不理會」資料,因此此二個組塊可被組合。由於此組塊使用位元打包,但先前編碼緩衝器625中的組塊使用RLE,因此可將所述二種編碼方案中的一種替換成另一種編碼方案。在此實例中,可使用RLE對經位元打包編碼的資料進行轉換編碼。因此,先前編碼緩衝器625現在儲存「[8,RLE],0」,其將第三組塊中的2個「不理會」值、第四組塊中的五個「不理會」值及第五組塊中的一個「不理會」值組合。
Since this chunk may contain "don't care" data, and the
最後,由於第五組塊是輸入串流的最後一個組塊,因此規則評估器635可輸出先前編碼緩衝器625中的已轉換編碼的資料。表6的第6列示出最終輸出。
Finally, since the fifth chunk is the last chunk of the input stream, the
以上二個實例均未示出其中連續組塊包括感興趣的資料的情況。本發明概念的實施例可以不同的方式來處置此類情況。在本發明概念的一個實施例中,當當前編碼緩衝器620包含感興趣的資料時,可將先前編碼緩衝器625中的任何組塊輸出至輸出串流(亦即,若當前編碼緩衝器620包含感興趣的資料,則不嘗試將當前編碼緩衝器620中的資料與先前編碼緩衝器625中的資料組合)。在本發明概念的另一實施例中,當前編碼緩衝器620中的組塊與先前編碼緩衝器625中的組塊可被組合。然而,在本發明概念的此類實施例中,此種組合是否可行可取決於感興趣的值是否相同。例如,若一個組塊包括關於中國公民的資料,而另
一組塊包括關於韓國公民的資料,則視本發明概念的實施例而定,可組合或可不組合此類組塊。另一方面,若二個組塊均包括關於韓國公民的資料,則將此二個組塊組合可為可行的。
Neither of the above two examples shows a situation where consecutive chunks include the material of interest. Embodiments of the inventive concept may handle such situations in different ways. In one embodiment of the inventive concept, when the
規則評估器635可使用轉換編碼規則630來判斷什麼資料是被感興趣的及什麼資料是不被感興趣的、什麼資料可被輸出及什麼資料可儲存於先前編碼緩衝器625中以及組塊是否可自一種編碼方案被轉換編碼成另一種編碼方案。確切的規則可依據可由資料使用的編碼方案而變化。
如上所述,規則評估器635亦可包括索引映射器615。在其中規則評估器635包括索引映射器615的本發明概念實施例中,規則評估器635可在應用轉換編碼規則630之前對當前編碼緩衝器620的內容應用索引映射器615。
As mentioned above,
表7示出當所使用的編碼方案可為RLE或位元打包時可使用的一些規則。在其中可使用其他編碼方案的本發明概念實施例中,規則可相應地變化:所有此類變化均被認為是本發明概念的實施例。此外,本發明概念的實施例可包括管理在多於二種不同類型的編碼方案之間對資料進行轉換編碼的規則。例如,混合編碼方案可使用三種不同的編碼方案:當圖6所示當前編碼緩衝器620及圖6所示先前編碼緩衝器625包含使用任何一對不同的編碼方案被編碼的資料時,圖6所示轉換編碼規則630則可指定如何對資料進行轉換編碼。
Table 7 shows some rules that can be used when the encoding scheme used can be RLE or bit packing. In embodiments of the inventive concept where other encoding schemes may be used, the rules may vary accordingly: all such variations are considered embodiments of the inventive concept. Additionally, embodiments of the inventive concept may include rules governing the conversion encoding of material between more than two different types of encoding schemes. For example, a hybrid encoding scheme may use three different encoding schemes: When the
在表7中,P表示圖1所示主機電腦105可感興趣的資
料,DC表示主機電腦105可不感興趣的資料。(以下參照圖11進一步論述可如何將資料識別為被感興趣或不被感興趣。)在使用變數(例如x、y或z)的情況下,該些變數可表示圖1所示主機電腦105感興趣或不感興趣的值的數目的計數。例如,表達式「[g,BP]P(x),DC(y),P(z)」(如在規則7及9中所使用)可指示資料是使用位元打包被編碼:群組包括在群組開始處的x個感興趣的值、在群組中間的y個不感興趣的值以及在群組結束處的z個感興趣的值。可預期,x、y、z、g及G滿足以下約束條件:g * G=x+y+z,1 g 63,x mod G=0,y mod G=0,z mod G=0,y≠0,且y 16除以每一已打包值的位元數目。最後,PEB(在輸出行中)可指示,當規則被選擇來應用時儲存於先前編碼緩衝器625中的任何內容可被輸出至輸出串流。表7亦考量其中任何資料已被索引映射器615映射且因此包含與圖8所示輸出字典810對應的值的情況。
In Table 7, P represents the information that the
以上論述闡述了一般情況下可如何對資料執行轉換編碼。然而,當資料是以行式格式儲存時,可利用行式格式來有利於轉換編碼。在可闡述此種作用之前,理解行式格式是有用的。為便於說明,參照SSD來闡述行式格式,然而本發明概念的實施例可包括可利用行式格式的其他儲存裝置。 The discussion above illustrates how transformation encoding can generally be performed on data. However, when the data is stored in line format, the line format can be used to facilitate encoding conversion. Before this effect can be explained, it is useful to understand line format. For ease of explanation, the row format is described with reference to an SSD, however embodiments of the inventive concepts may include other storage devices that may utilize the row format.
圖9示出以行式格式儲存的示例性檔案。在圖9中,示出了檔案。所述檔案可包括檔案元資料905以及行組塊910-1、910-2及910-3。雖然圖9示出三個行組塊910-1至910-3,然而本發明概念的實施例可不受限制地包括任何數目(零個或多個)的組塊。
Figure 9 shows an exemplary file stored in line format. In Figure 9, an archive is shown. The file may include
檔案元資料905可包括與檔案相關的元資料。雖然亦可儲存其他元資料,然而圖9將檔案元資料905示為包括檔案至區塊映射915及字典頁面920。字典頁面920可為用於對檔案資料內
的值進行編碼的字典,例如以上表1中所示的字典。字典頁面920亦可儲存可用於對檔案內的不同資料進行編碼的多個字典:例如,一個字典可儲存國家名稱,而另一字典可儲存姓氏。
檔案至區塊映射915可識別儲存單獨行組塊910-1、910-2及910-3的區塊以及其相對次序。檔案至區塊映射915亦可指定每一行組塊910-1、910-2及910-3內的資料頁面的次序,或者可在行組塊910-1、910-2及910-3內指定頁面次序。檔案至區塊映射915可類似於圖5所示檔案至區塊映射530,只不過其中檔案至區塊映射530可提供關於哪些區塊用於儲存圖1所示儲存裝置120上所儲存的每一檔案的資訊,而檔案至區塊映射915可提供關於哪些區塊用於儲存圖9所示檔案的資訊。(當然,二個檔案至區塊映射可一起使用:圖5所示檔案至區塊映射530可用於定位為每一檔案儲存檔案元資料905的區塊,然後檔案元資料905中的檔案至區塊映射915可用於定位為檔案儲存行組塊的區塊。)
File-to-
一般而言,單個行組塊可橫跨多個區塊,且單個區塊可儲存多個行組塊。只要存在某種方式來識別資料儲存於何處以及所述資料表示什麼(例如,什麼檔案包含所述資料),在更一般的資料儲存解決方案中就幾乎不存在困難。然而為便於進行此論述,考量其中行組塊可裝配於單個區塊中且各區塊不共享行組塊的情況。因此,行組塊910-1、910-2及910-3中的每一者可儲存於單獨的區塊中。 In general, a single row chunk can span multiple blocks, and a single block can store multiple row chunks. As long as there is some way to identify where the data is stored and what the data represents (eg, what file contains the data), there is little difficulty in more general data storage solutions. To facilitate this discussion, however, consider the case where row chunks can be assembled into a single block and row chunks are not shared between blocks. Therefore, each of row chunks 910-1, 910-2, and 910-3 may be stored in a separate block.
在行組塊910-1(行組塊910-2及910-3是類似的)內,
可具有字典頁面925及資料頁面930-1、930-2及930-3。儘管圖9示出三個資料頁面,然而本發明概念的實施例可在行組塊中包括任何數目(零個或多個)的資料頁面。資料頁面可儲存檔案的實際資料,所述實際資料被劃分成可裝配至單獨頁面中的單位。
Within row chunk 910-1 (row chunks 910-2 and 910-3 are similar),
There may be a
字典頁面925可儲存用於行組塊910-1內資料的字典。與字典頁面920一樣,字典頁面925可儲存可用於對檔案內的不同資料進行編碼的多個字典。
可能會出現關於為什麼圖9示出字典頁面920及字典頁面925的問題。原因是字典頁面920及925可在行式格式的不同實施方案中使用。例如,一種行式儲存格式可對整個檔案使用單個字典,所述字典可儲存於字典頁面920中。然而另一種行式格式可在每一行組塊910-1、910-2及910-3中使用單獨的字典頁面925。使用字典頁面925的優點在於,若特定行組塊不使用字典或者在所述特定行組塊內的資料中不使用某些值,則可自字典頁面925省略此種資訊,進而減小字典頁面925的大小(或者甚至完全消除字典頁面925)。然而另一方面,不同行組塊中的多個字典頁面925可能導致資料複製:可在多個行組塊中使用相同的字典條目。此即為什麼字典頁面920及925是以虛線示出:視所使用的行式儲存格式而定,可省略任一者。(事實上,甚至可能發生以下情形:檔案根本不使用字典,在此種情形中,字典頁面920及925可均被省略。)
Questions may arise as to why Figure 9 shows
現在已闡述了行式格式,可闡述為在使用行式格式的儲
存裝置中使用圖4所示轉換編碼器420而進行的調適。圖10示出被配置成實作轉換編碼的圖1所示儲存裝置120,其中資料是以行式格式儲存。在圖10中,儲存裝置120可包括功能類似於以上參照圖5所述者的主機介面層505、儲存裝置控制器510及儲存器515(再次,儲存裝置120可為SSD、硬碟驅動機或可使用行式格式的任何其他儲存裝置)。
Now that row format has been explained, it can be explained that when using row format storage
The adaptation is performed in the storage device using the
儲存裝置120亦可包括儲存器內計算控制器1005、行組塊處理器1010及儲存器內計算315。儲存器內計算控制器1005可管理將什麼資訊發送至儲存器內計算315及行組塊處理器1010。例如,當圖1所示主機電腦105請求儲存裝置120執行某一加速功能(例如對特定國家的公民的數目進行計數)時,儲存器內計算控制器1005可向行組塊處理器1010提供述詞(識別感興趣的國家)。儲存器內計算控制器1005亦可自儲存器515存取資料(具體而言,行組塊),並將所述資料提供至行組塊處理器1010。儲存器內計算控制器1005亦可確定在資料中所使用的編碼方案(假設對行組塊或整個檔案使用單個編碼方案,而非混合編碼方案),並將編碼類型提供至行組塊處理器1010。最後,根據來自圖1所示主機電腦105的請求,儲存器內計算控制器1005可處理自行組塊處理器返回的已轉換編碼的資料,且可將所述已轉換編碼的資料傳回至圖1所示主機電腦105(經由主機介面層505)或者儲存器內計算315。以下參照圖11論述行組塊處理器1010的結構及操作。
可使用經適當程式化的通用處理器、FPGA、ASIC、GPU或GPGPU以及其他可能方案來實作儲存器內計算控制器1005及行組塊處理器1010。儲存器內計算控制器1005及行組塊處理器1010可使用相同的硬體或不同的硬體來實作(例如,儲存器內計算控制器1005可被實作為ASIC,而行組塊處理器1010可被實作為FPGA),且其可被實作為單個單元或單獨的組件。
In-
圖11示出被配置成實作轉換編碼的圖10所示行組塊處理器1010,其中資料是以行式格式儲存。在圖11中,行組塊處理器1010可接收輸入串流、編碼類型及述詞作為輸入,且可產生輸出串流作為輸出。輸入串流可儲存於輸入緩衝器1105中。輸入串流可為行組塊中的單個資料頁面,或者其可為行組塊中的所有資料。然後,來自輸入緩衝器1105的資料可作為輸入串流被提供至轉換編碼器420(如上參照圖6所述):轉換編碼器420亦可自圖10所示儲存器內計算控制器1005接收編碼類型,如參照圖10所論述。注意,由於轉換編碼器420可包括圖6所示循環緩衝器605,因此可省略輸入緩衝器1105:資料可儲存於圖6所示循環緩衝器605中,圖6所示串流分離器610可對所述資料進行操作。然而,在本發明概念的一些實施例中,圖6所示循環緩衝器605可能並不大至足以儲存整個資料頁面或行組塊(或者輸入串流提供資料的速度可能較自圖6所示循環緩衝器605移除資料的速度更快),在此種情形中,輸入緩衝器1105可為可能未立即裝配至圖6所示循環緩衝器605中的資料充當暫時儲存庫。
Figure 11 illustrates the
轉換編碼器420的輸出一以上參照圖6所述的輸出串流一可儲存於輸出緩衝器1110中。再次,雖然資料可在由轉換編碼器420產生時被直接發送至其目的地,然而以特定單位(例如完整的資料頁面或行組塊)發送資料可為有用的。在此類情況中,輸出緩衝器1110可儲存輸出串流,直至產生適當的資料單位為止。此時,根據所請求的轉換編碼操作,行組塊處理器1010可將輸出串流發送至圖10所示儲存器內計算控制器1005或圖1所示主機電腦105。
The output of the
索引映射器615(在圖11中被示為在轉換編碼器420之外,但索引映射器615可如圖6所示為轉換編碼器420的一部分)可自述詞評估器1115及「不理會」評估器1120接收資訊。述詞評估器1115可自圖10所示儲存器內計算控制器1005接收述詞,並使用所述述詞來判斷什麼資料被感興趣。述詞評估器1115可使用比較運算子來識別圖1所示主機電腦105對圖8所示輸入字典805(其可為圖9所示字典頁面920及925中的任一者)中的哪些值感興趣。「不理會」評估器1120可類似地(但以鏡像形式)運作,以識別什麼資料不被感興趣。注意,由於述詞評估器1115與「不理會」評估器1120在操作上是互補的,因此可使用此二個評估器中的一者(其中不滿足一個評估器的準則的任何資料因此適合另一評估器的準則):因此,可省略述詞評估器1115及「不理會」評估器1120中的一者。此資訊可由述詞評估器1115及「不理會」評估器1120提供至索引映射器615,進而使得索引映射器
615能夠建立自圖8所示輸入字典805至圖8所示輸出字典810的映射。
Index mapper 615 (shown in Figure 11 as external to transform
作為實例,再次考量來自圖6所示主機電腦105的欲對資料集中包括美利堅合眾國公民的條目的數目進行計數的查詢。當此查詢到達時,可提取述詞(例如,「citizenship(公民身份)=United States of America(美利堅合眾國)」:述詞的確切格式可取決於資料集的格式及用於提交查詢的應用)。可對圖8所示輸入字典805(例如表1中所示的字典)進行查核,以將「United States of America(美利堅合眾國)」替換成值「3」。因此,提供至索引映射器615的述詞可指定「citizenship(公民身份)=3」,之後索引映射器615可生成圖8所示輸出字典810(例如表2中所示的字典)及表3中所示的映射。
As an example, consider again a query from the
注意,述詞評估器1115的結果亦可被提供至轉換編碼器420,以用於建構圖6所示轉換編碼規則630。由於圖6所示轉換編碼規則630可取決於知曉圖1所示主機電腦105對什麼資料感興趣,因此可使用述詞評估器1115的結果來調適圖6所示轉換編碼規則630。例如,再次考量表7中所示的規則。述詞評估器1115的結果(或者甚至自圖8所示輸入字典805至圖8所示輸出字典810的映射(例如表3中所示的映射))可用於在各種規則中為P及DC建立適當的值。
Note that the results of the
亦注意,在圖11中,述詞適用於作為輸入串流輸入至轉換編碼器420的任何資料。雖然斷定述詞將被應用於圖1所示
主機電腦105向其提交查詢的整個資料集是合理的,然而轉換編碼器420將輸入串流視為完整的,即使輸入串流可能表示資料集的一部分。例如,行組塊處理器1010可使用轉換編碼器420將圖9所示每一資料頁面930-1、930-2及930-3作為其自己的「輸入串流」來處理。由於轉換編碼器420不知道輸入串流表示什麼,因此此過程沒有問題。
Note also that in Figure 11, the predicate applies to any data input to transcoder 420 as an input stream. While asserting that the predicate will be applied as shown in Figure 1
It is reasonable for the
圖12A至圖12C示出根據本發明概念實施例,圖4及圖6所示轉換編碼器420對資料進行轉換編碼的示例性程序的流程圖。在圖12A中,在方塊1205處,圖6所示轉換編碼器420可進行檢查以查看是否仍存在欲自輸入串流接收的任何資料。一般而言,此輸入串流可來自任何源,但如以上參照圖9至圖11所論述,當資料是以行式格式儲存時,此輸入串流可為行組塊中的資料頁面。若不存在欲自輸入串流接收的剩餘資料,則在方塊1210處,圖6所示轉換編碼器420可進行檢查以查看在圖6所示先前編碼緩衝器625或圖6所示當前編碼緩衝器620中是否剩餘任何已轉換編碼的資料。若在圖6所示先前編碼緩衝器625或圖6所示當前編碼緩衝器620中剩餘任何已轉換編碼的資料,則將圖6所示先前編碼緩衝器625中的已轉換編碼的資料輸出至輸出串流,隨後將圖6所示當前編碼緩衝器625中的已轉換編碼的資料輸出至輸出串流。在大多數情況下,圖6所示當前編碼緩衝器620中應不存在任何資料,乃因規則評估器635可對圖6所示當前編碼緩衝器620中的資料進行操作。甚至在其中由於應用圖6所示轉換
編碼規則630(例如,如表7的規則6至9中所示)而可在圖6所示當前編碼緩衝器620中留下資料的情況下,圖6所示規則評估器635亦將然後先對所述資料進行操作,之後圖6所示轉換編碼器420才將尋找來自輸入串流(經由圖6所示循環緩衝器605及圖6所示串流分離器610)的新資料:圖6所示轉換編碼器420可在等待圖6所示當前編碼緩衝器620被清空之後才試圖處理來自輸入串流的下一個資料組塊。然而,倘若在圖6所示當前編碼緩衝器620中仍留有已轉換編碼的資料,則可將所述已轉換編碼的資料輸出至輸出串流。一旦在方塊1215處所有資料均被輸出至輸出串流,處理便可結束(直至預期圖6所示轉換編碼器420處理新的輸入串流)。
12A to 12C illustrate a flow chart of an exemplary procedure for converting and encoding data by the converting
假設輸入串流中仍存在欲處理的資料,則在方塊1220處,圖6所示循環緩衝器605可自輸入串流接收下一個已編碼資料,此後,圖6所示串流分離器610可識別已編碼資料中的第一組塊並將所述組塊轉發至圖6所示索引映射器615。(在其中圖6所示索引映射器615實際上是圖6所示規則評估器635的一部分的本發明概念實施例中,圖6所示串流分離器610可將已編碼資料組塊放置於圖6所示當前編碼緩衝器620中。)在方塊1225處,圖6所示索引映射器615(或圖6所示規則評估器635)可判斷資料組塊是否被感興趣:更具體而言,資料組塊是否包括由圖1所示主機電腦105請求的資料(例如,依據述詞)。
Assuming that there is still data to be processed in the input stream, at
若已編碼資料組塊包括圖1所示主機電腦105感興趣的
資料,則在方塊1230(圖12B)處,圖6所示索引映射器615(或圖6所示規則評估器635)可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射來對組塊中的任何資料進行重新編碼。在方塊1235處,圖6所示規則評估器635可進行檢查以查看圖1所示主機電腦105是否對圖6所示先前編碼緩衝器625中的任何已轉換編碼的資料感興趣。若否(且回想起如在圖12A所示方塊1225處所確定,圖1所示主機電腦105對當前組塊感興趣),則在方塊1240處,圖6所示轉換編碼器420可將圖6所示先前編碼緩衝器625中的已轉換編碼的資料輸出至輸出串流,且在方塊1245處,圖6所示轉換編碼器420可將當前已轉換編碼的組塊儲存於圖6所示先前編碼緩衝器625中,之後處理可返回至圖12A所示方塊1205。
If the encoded data chunk includes the
另一方面,若如在方塊1235處所確定,圖6所示先前編碼緩衝器625亦儲存圖1所示主機電腦105感興趣的資料,則在方塊1250處,圖6所示規則評估器635可判斷當前組塊與圖6所示先前編碼緩衝器625中的已轉換編碼的組塊是否使用相同的編碼方案。若否,則在方塊1255處,圖6所示規則評估器635可改變由組塊中的一者(圖6所示當前編碼緩衝器620中的組塊或者圖6所示先前編碼緩衝器625中的組塊)使用的編碼方案。(在其中使用多於二種編碼方案的情況下,圖6所示規則評估器635可改變由圖6所示當前編碼緩衝器620中的組塊及圖6所示先前編碼緩衝器625中的組塊使用的編碼方案。)然後,在知曉圖6
所示當前編碼緩衝器620中的組塊與圖6所示先前編碼緩衝器625中的組塊使用相同的編碼方案的情況下,在方塊1260處,圖6所示規則評估器635可將二個組塊組合成單個組塊,所述單個組塊可儲存於圖6所示先前編碼緩衝器625中,之後處理可返回至圖12A所示方塊1205。
On the other hand, if, as determined at
注意,圖12B示出當前組塊可被轉換編碼二次:一次在方塊1230中(當值被更新成對應於圖8所示輸出字典810時)且一次在方塊1255中(若當前組塊的編碼方案被改變,當自一種編碼方案改變成另一種編碼方案時)。雖然可分開地執行此二個操作,但亦可將此二個操作組合:亦即,同時改變編碼方案及更新值。本發明概念的實施例包括分開地及作為單個步驟來執行此二個操作。
Note that Figure 12B shows that the current chunk may be transcoded twice: once in block 1230 (when the values are updated to correspond to the
回想起,圖12B闡述了當圖1所示主機電腦105對當前組塊感興趣(如在圖12A所示方塊1225處所確定)時執行的操作。倘若圖1所示主機電腦105對當前組塊不感興趣(再次,如在圖12A所示方塊1225處所確定),則在方塊1265(圖12C)處,圖6所示索引映射器615(或圖6所示規則評估器635)可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射對組塊中的任何資料進行重新編碼(具體而言,重新編碼成「不理會」值)。在方塊1270處,圖6所示規則評估器635可進行檢查以查看圖1所示主機電腦105是否對圖6所示先前編碼緩衝器625中的任何已轉換編碼的資料感興趣。若是(且回想起如在圖12A所示方塊1225
處所確定,圖1所示主機電腦105對當前組塊不感興趣),則在方塊1275處,圖6所示轉換編碼器420可將圖6所示先前編碼緩衝器625中的已轉換編碼的資料輸出至輸出串流,且在方塊1280處,圖6所示轉換編碼器420可將當前已轉換編碼的組塊儲存於圖6所示先前編碼緩衝器625中,之後處理可返回至圖12A所示方塊1205。
Recall that Figure 12B illustrates the operations performed when the
另一方面,若如在方塊1270處所確定,圖6所示先前編碼緩衝器625亦儲存圖1所示主機電腦105不感興趣的資料,則在方塊1285處,圖6所示規則評估器635可判斷當前組塊與圖6所示先前編碼緩衝器625中的已轉換編碼的組塊是否使用相同的編碼方案。若否,則在方塊1290處,圖6所示規則評估器635可改變由組塊中的一者(圖6所示當前編碼緩衝器620中的方塊或者圖6所示先前編碼緩衝器625中的方塊)使用的編碼方案。(在其中使用多於二種編碼方案的情況下,圖6所示規則評估器635可改變由圖6所示當前編碼緩衝器620中的組塊及圖6所示先前編碼緩衝器625中的組塊使用的編碼方案。)然後,在知曉圖6所示當前編碼緩衝器620中的組塊及圖6所示先前編碼緩衝器625中的組塊使用相同的編碼方案的情況下,在方塊1295處,圖6所示規則評估器635可將二個組塊組合成單個組塊,所述單個組塊可儲存於圖6所示先前編碼緩衝器625中,之後處理可返回至圖12A所示方塊1205。
On the other hand, if, as determined at
注意,圖12C示出當前組塊可被轉換編碼二次:一次在
方塊1265中(當值被更新成對應於圖8所示輸出字典810時)且一次在方塊1290中(若當前組塊的編碼方案被改變,當自一種編碼方案改變成另一種編碼方案時)。雖然可分開地執行此二個操作,但亦可將此二個操作組合:亦即,同時改變編碼方案及更新值。本發明概念的實施例包括分開地及作為單個步驟來執行此二個操作。
Note that Figure 12C shows that the current chunk can be transcoded twice: once in
in block 1265 (when the values are updated to correspond to the
在圖12A至圖12C中,存在隱含的假設:在圖6所示先前編碼緩衝器625中存在一些資料。例如,方塊1235及1270闡述了其中在圖6所示先前編碼緩衝器625中存在一些資料的情況。此通常是合理的假設,乃因已轉換編碼的資料可在圖6所示先前編碼緩衝器625中緩衝,以支援組合可被組合的資料組塊(若資料已被輸出至輸出串流,則嘗試組合組塊將為時已晚)。然而,可能存在其中圖6所示先前編碼緩衝器625中未儲存資料的情況。作為一個實例,當輸入串流的第一個組塊被處理時,在先前編碼緩衝器625中不存在資料(乃因在所述輸入串流中先前尚未處理過任何資料)。作為第二實例,可能存在不支援資料組塊組合的編碼方案,在此種情形中,將先前組塊儲存於圖6所示先前編碼緩衝器625中幾乎不存在價值。若在圖6所示先前編碼緩衝器625中不存在資料,則沒必要將當前組塊與圖6所示先前編碼緩衝器625中的(不存在的)組塊進行比較或者自圖6所示先前編碼緩衝器625輸出(不存在的)組塊。簡單的解決方案是,若在圖6所示先前編碼緩衝器625中不存在資料,則可不進行任何將取決
於在先前編碼緩衝器625中存在資料的操作。因此,例如,在圖12B中,若在先前編碼緩衝器625中不存在資料,則處理可直接自方塊1230跳至方塊1245(以在圖6所示先前編碼緩衝器625中緩衝當前已轉換編碼的組塊),且在圖12C中,處理可直接自方塊1265跳至方塊1280(以在圖6所示先前編碼緩衝器625中緩衝當前已轉換編碼的組塊)。
In Figures 12A-12C, there is an implicit assumption that there is some data in the
仔細查核圖12B及圖12C將會發現二者之間存在相對很小的差異。值得注意的一些差異在於方塊1230及1265、以及自方塊1235及1270離開的不同分支。事實上,甚至該些差異亦是相對次要的:方塊1230及1265均是關於基於圖8所示輸出字典810進行的重新編碼(方塊1265只是具體列出了「不理會」值的使用)。並且,雖然離開方塊1235及1270的分支被不同地標記,然而原因是由於方塊1235及1270均是關於判斷當前組塊是否可與先前組塊組合。因此,圖12B至圖12C在理論上可被組合,但代價是關於操作序列的清晰性會有某種程度的損失。
A closer inspection of Figure 12B and Figure 12C will reveal that there are relatively small differences between the two. Some differences worth noting are
圖13示出圖6所示串流分離器610將輸入已編碼資料劃分成組塊的示例性程序的流程圖。在圖13中,在方塊1305處,圖6所示串流分離器610可接收輸入已編碼資料(其可源自圖1所示儲存裝置120內的圖3所示儲存器305),所述輸入已編碼資料可在緩衝器(例如圖11所示輸入緩衝器1105或圖6所示循環緩衝器605)中緩衝。在方塊1310處,圖6所示串流分離器610可將輸入已編碼資料劃分成組塊。在方塊1315處,圖6所示串流
分離器610可將組塊發送至圖6所示轉換編碼器420(或者發送至圖6所示索引映射器615或圖6所示當前編碼緩衝器620)。
FIG. 13 is a flowchart illustrating an exemplary procedure for dividing input encoded data into chunks by the
圖14A至圖14B示出根據本發明概念實施例,圖10所示行組塊處理器1010及/或圖4及圖6所示轉換編碼器420對以行式格式儲存的資料進行轉換編碼的示例性程序的流程圖。在至少一個實施例中,圖14A至圖14B亦可被視為擴展了圖6所示串流分離器610可如何接收輸入已編碼資料,如圖13所示方塊1305中所述。
14A to 14B illustrate how the
在圖14A中,在方塊1405處,圖10所示行組塊處理器1010可存取檔案的圖9所示檔案至區塊映射915(或者作為另一選擇或附加地圖5所示檔案至區塊映射530)。在方塊1410處,圖10所示行組塊處理器1010可使用圖9所示檔案至區塊映射915來定位圖9所示檔案元資料905且因此識別圖9所示輸入字典920。若圖9所示每一行組塊910-1、910-2及910-3包括其自己的圖9所示字典頁面925,則可自圖9所示檔案元資料905省略圖9所示字典頁面920,在此種情形中,可省略方塊1410,如虛線1415所示。然後,使用圖9所示檔案至區塊映射915,在方塊1420處,圖10所示行組塊處理器1010可識別檔案的行組塊(其可為儲存於圖1所示儲存裝置120上的資料區塊)。
In Figure 14A, at
在方塊1425(圖14B)處,圖10所示行組塊處理器1010可判斷是否存在更多的行組塊(區塊)欲存取。若否,則處理完成。否則,在方塊1430處,圖10所示行組塊處理器1010可自圖
9所示行組塊910-1、910-2或910-3存取圖9所示字典頁面925。若圖9所示檔案元資料905儲存圖9所示字典頁面920,則圖9所示行組塊910-1、910-2及910-3可省略圖9所示字典頁面925,在此種情形中,可省略方塊1430,如虛線1435所示。在方塊1440處,圖10所示行組塊處理器1010可自圖9所示行組塊910-1、910-2及910-3存取圖9所示資料頁面930-1、930-2及930-3。在方塊1445處,圖10所示行組塊處理器1010可將圖8所示輸入字典805及行組塊的圖9所示資料頁面930-1、930-2及930-3(按次序)轉發至圖6所示轉換編碼器420、圖6所示串流分離器610或圖6所示索引映射器615。
At block 1425 (FIG. 14B), the
圖15示出根據本發明概念實施例,圖6所示索引映射器615將圖8所示輸入字典805映射至圖8所示輸出字典810的示例性程序的流程圖。在圖15中,在方塊1505處,圖6所示索引映射器615可接收圖8所示輸入字典805(例如,自圖10所示行組塊處理器1010)。在方塊1510處,圖6所示索引映射器615可判斷圖8所示輸入字典805中的什麼資料被感興趣。圖6所示索引映射器615使用例如自圖1所示主機電腦105、可能經由圖10所示儲存器內計算控制器1005接收的述詞來作出此判斷。在方塊1515處,圖6所示索引映射器615可生成圖8所示輸出字典810。輸出字典810可包括圖1所示主機電腦105感興趣的所有條目,但可將圖1所示主機電腦105不感興趣的所有條目合併成單個「不理會」值。在方塊1520處,圖6所示索引映射器615可將
來自圖8所示輸入字典805的值映射至圖8所示輸出字典810。最後,在方塊1525處,圖6所示索引映射器615可輸出圖8所示輸出字典810。
15 illustrates a flowchart of an exemplary procedure by which the
圖16A至圖16B示出根據本發明概念實施例,圖10所示儲存器內計算控制器1005管理自圖1所示主機電腦105接收的述詞並潛在地對已轉換編碼的資料執行加速功能的示例性程序的流程圖。在圖16A中,在方塊1605處,圖10所示儲存器內計算控制器1005可自圖1所示主機電腦105接收述詞。在方塊1610處,圖10所示儲存器內計算控制器1005可存取由查詢涵蓋的已編碼資料的圖8所示輸入字典805。在方塊1615處,圖10所示儲存器內計算控制器1005可識別圖8所示輸入字典805中由述詞涵蓋的條目(即,圖8所示輸入字典805中圖1所示主機電腦105感興趣的條目)。在方塊1620處,圖10所示儲存器內計算控制器1005可創建包含由述詞涵蓋的條目的圖8所示輸出字典810。在方塊1625處,圖10所示儲存器內計算控制器1005可將圖8所示輸入字典805中由述詞涵蓋的條目映射至圖8所示輸出字典810中的條目。
16A-16B illustrate the in-
在方塊1630處,圖10所示儲存器內計算控制器1005可識別圖8所示輸入字典805中未由述詞涵蓋的條目(即,圖8所示輸入字典805中圖1所示主機電腦105不感興趣的條目)。在方塊1635處,圖10所示儲存器內計算控制器1005可向圖8所示輸出字典810添加「不理會」條目。在方塊1640(圖16B)處,
圖10所示儲存器內計算控制器1005可將輸入字典中未由述詞涵蓋的條目映射至圖8所示輸出字典810中的「不理會」條目。
At
在方塊1645處,圖6所示規則評估器635(在圖6所示轉換編碼器420內)可針對來自圖1所示主機電腦105的查詢使用述詞來調適圖6所示轉換編碼規則630。在方塊1650處,圖6所示索引映射器615及圖6所示規則評估器635(均潛在地在圖6所示轉換編碼器420內)可使用自圖8所示輸入字典805至圖8所示輸出字典810的映射及圖6所示轉換編碼規則630來將已編碼資料自輸入串流轉換編碼成輸出串流(如以上參照圖12A至圖12C所論述)。
At
此時,存在各種選項。如方塊1655所示,圖10所示儲存器內計算控制器1005可自圖6所示轉換編碼器420接收輸出串流,且可將已轉換編碼的資料轉發至圖1所示主機電腦105,並且在方塊1660處,圖10所示儲存器內計算控制器1005可將圖8所示輸出字典810發送至圖1所示主機電腦105。作為另一選擇,在方塊1665處,圖10所示儲存器內計算控制器1005可對輸出串流中的資料應用加速功能,且在方塊1670處,圖10所示儲存器內計算控制器1005可將加速功能的結果發送至圖1所示主機電腦105。
At this point, various options exist. As shown in
在圖12A至圖16B中,示出本發明概念的一些實施例。然而熟習此項技術者將認識到,藉由改變方塊的次序、藉由省略方塊或者藉由包括圖式中未示出的鏈接,本發明概念亦可具有其 他實施例。無論是否明確闡述,流程圖的所有此類變化均被認為是本發明概念的實施例。 In Figures 12A-16B, some embodiments of the inventive concept are shown. However, those skilled in the art will recognize that the inventive concept may have other aspects by changing the order of the blocks, by omitting blocks, or by including links not shown in the figures. Other examples. All such changes to the flow diagrams, whether expressly stated or not, are considered embodiments of the inventive concept.
本發明概念的實施例提供了優於先前技術的技術優點。在傳統系統中,將已解碼資料發送至圖1所示主機電腦105。即使發送至圖1所示主機電腦105的資料是選擇性的(亦即,發送至圖1所示主機電腦105的資料包括感興趣的資料),資料仍是在未壓縮或未編碼的情況下被發送,此意味著藉由選擇性達成了空間節省。對比之下,由於大部分的儲存縮減是藉由編碼而非壓縮來達成的,因此向圖1所示主機電腦105發送已編碼資料通常涉及向圖1所示主機電腦105發送較發送已解碼資料更少的資料。此外,由於資料可自一種編碼方案被轉換編碼成另一種編碼方案,因此使用圖6所示轉換編碼器420可較作為單獨的操作對資料進行解碼及對資料進行重新編碼更高效。
Embodiments of the inventive concept provide technical advantages over prior art. In a conventional system, the decoded data is sent to the
以下論述旨在提供對可在其中實作本發明概念某些態樣的一或多個適合的機器的簡短總體說明。所述一或多個機器可至少部分地藉由以下來控制:來自例如鍵盤、滑鼠等傳統輸入裝置的輸入;以及自另一機器接收到的指令、與虛擬實境(virtual reality,VR)環境、生物統計回饋(biometric feedback)、或其他輸入訊號的交互作用。本文中所用用語「機器」旨在廣泛地囊括單個機器、虛擬機器、或由以通訊方式耦合的一起運作的機器、虛擬機器、或集體作業的多個裝置。示例性機器包括:計算裝置,例如個人電腦、工作站、伺服器、可攜式電腦、手持式裝置、電 話、平板電腦(tablet)等;以及運輸裝置,例如私人或公共運輸(例如汽車、火車、計程車等)。 The following discussion is intended to provide a brief general description of one or more suitable machines in which certain aspects of the present concepts may be implemented. The one or more machines may be controlled, at least in part, by input from traditional input devices such as keyboards and mice; as well as instructions received from another machine; and virtual reality (VR). Interaction with the environment, biometric feedback, or other input signals. As used herein, the term "machine" is intended to broadly include a single machine, a virtual machine, or multiple devices that are communicatively coupled to operate together, virtual machines, or collective operations. Exemplary machines include: computing devices such as personal computers, workstations, servers, portable computers, handheld devices, electronic phones, tablets, etc.; and transportation devices, such as private or public transportation (e.g. cars, trains, taxis, etc.).
所述一或多個機器可包括嵌入式控制器,例如可程式化或非可程式化邏輯裝置或陣列、特殊應用積體電路(Application Specific Integrated Circuit,ASIC)、嵌入式電腦、智慧卡等。所述一或多個機器可利用連接至一或多個遠端機器(例如藉由網路介面、數據機、或其他通訊性耦合)的一或多個連接。機器可以例如內部網路(intranet)、網際網路、局域網路、廣域網路等實體及/或邏輯網路的方式進行互連。熟習此項技術者將瞭解,網路通訊可利用各種有線及/或無線短程或長程載體及協定,所述載體及協定包括射頻(radio frequency,RF)、衛星、微波、電氣及電子工程師學會(Institute of Electrical and Electronics Engineers,IEEE)802.11、藍芽®、光學的、紅外線的、纜線、雷射等。 The one or more machines may include embedded controllers, such as programmable or non-programmable logic devices or arrays, Application Specific Integrated Circuits (ASICs), embedded computers, smart cards, etc. The one or more machines may utilize one or more connections to one or more remote machines (eg, via a network interface, modem, or other communicative coupling). Machines may be interconnected through physical and/or logical networks such as an intranet, the Internet, a local area network, a wide area network, etc. Those familiar with this technology will understand that network communications may utilize a variety of wired and/or wireless short-range or long-range carriers and protocols, including radio frequency (RF), satellite, microwave, IEEE (Institute of Electrical and Electronics Engineers) Institute of Electrical and Electronics Engineers (IEEE) 802.11, Bluetooth®, optical, infrared, cable, laser, etc.
可藉由參照或結合相關聯資料來闡述本發明概念的實施例,所述相關聯資料包括當由機器存取時使得所述機器執行任務或定義抽象資料類型或低層階硬體上下文的功能、程序、資料結構、應用程式等。相關聯資料可儲存於例如揮發性及/或非揮發性記憶體(例如,RAM、ROM等)中,或儲存於包括硬驅動機、軟磁碟(floppy disk)、光學儲存器、磁帶(tape)、快閃記憶體、記憶條(memory stick)、數位視訊碟、生物儲存器等其他儲存裝置及其相關聯儲存媒體中。相關聯資料可以封包、串列資料、並列資料、傳播訊號等形式經由包括實體及/或邏輯網路在內的傳輸 環境而遞送,且可以壓縮或加密格式使用。相關聯資料可用於分佈式環境中,且可在本端及/或遠程地儲存以供機器存取。 Embodiments of the inventive concepts may be illustrated by reference to or in conjunction with associated information, including functionality that, when accessed by a machine, causes the machine to perform tasks or define abstract data types or low-level hardware contexts, Programs, data structures, applications, etc. The associated data may be stored in, for example, volatile and/or non-volatile memory (e.g., RAM, ROM, etc.), or may be stored in devices including hard drives, floppy disks, optical storage, and tapes. , flash memory, memory stick, digital video disk, biological memory and other storage devices and their associated storage media. Related data may be transmitted in the form of packets, serial data, parallel data, propagation signals, etc. via physical and/or logical networks. environment and can be used in compressed or encrypted formats. The associated data may be used in a distributed environment and may be stored locally and/or remotely for machine access.
本發明概念的實施例可包括包含可由一或多個處理器執行的指令的有形非暫時性機器可讀取媒體(machine-readable medium),所述指令包括用於執行本文所述本發明概念的要素的指令。 Embodiments of the inventive concepts may include tangible, non-transitory, machine-readable medium containing instructions executable by one or more processors, including instructions for performing the inventive concepts described herein. Element instructions.
上述方法的各種操作可藉由能夠執行所述操作的任何合適的構件(例如各種硬體及/或軟體組件、電路及/或模組)來執行。所述軟體可包括用於實作邏輯功能的可執行指令的有序清單,且可被實施成任何「處理器可讀取媒體」,以供指令執行系統、設備或裝置(例如單核心或多核心處理器或包含處理器的系統)使用或者與指令執行系統、設備或裝置結合使用。 The various operations of the methods described above may be performed by any suitable means capable of performing the operations (eg, various hardware and/or software components, circuits and/or modules). The software may include an ordered list of executable instructions for implementing logical functions, and may be implemented on any "processor-readable medium" for instruction execution system, apparatus or device (e.g., single-core or multi-core). A core processor or a system containing a processor) is used by or in conjunction with an instruction execution system, device or device.
結合本文所揭露實施例所闡述的方法或演算法及功能的方塊或步驟可直接以硬體、以由處理器執行的軟體模組或以此二者的組合來實施。若以軟體實作,功能可作為一或多個指令或碼儲存於有形非暫時性電腦可讀取媒體上或者藉由有形非暫時性電腦可讀取媒體傳輸。軟體模組可駐存於隨機存取記憶體(RAM)、快閃記憶體、唯讀記憶體(ROM)、電可程式化ROM(Electrically Programmable ROM,EPROM)、電可抹除可程式化ROM(Electrically Erasable Programmable ROM,EEPROM)、暫存器、硬碟、可抽換式磁碟、光碟唯讀記憶體(Compact Disc Read Only Memory,CD ROM)或此項技術中已知的任何其他形式的儲存媒 體中。 Blocks or steps of methods or algorithms and functions described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of the two. If implemented in software, the functions may be stored on or transmitted over a tangible non-transitory computer-readable medium as one or more instructions or code. Software modules can reside in random access memory (RAM), flash memory, read-only memory (ROM), electrically programmable ROM (Electrically Programmable ROM, EPROM), electrically erasable programmable ROM (Electrically Erasable Programmable ROM, EEPROM), scratchpad, hard disk, removable disk, Compact Disc Read Only Memory (Compact Disc Read Only Memory, CD ROM) or any other form known in the art storage media in the body.
已參照所示實施例闡述並示出了本發明概念的原理,應認識到,在不背離此類原理的條件下,可在排列及細節上對所示實施例加以潤飾,且可以任何所需方式加以組合。並且,儘管以上論述著重於具體實施例,然而預期存在其他配置。具體而言,儘管本文中使用例如「根據本發明概念的實施例」或類似詞句,然而該些片語意在籠統地提及實施例可能性,而並非旨在將本發明概念限制於特定實施例配置。本文所用的該些用語可提及可組合成其他實施例的相同或不同的實施例。 Having described and illustrated the principles of the inventive concept with reference to the illustrated embodiments, it will be recognized that the illustrated embodiments may be modified in arrangement and detail and may be modified in any desired manner without departing from such principles. ways to combine. Also, although the above discussion focuses on specific embodiments, other configurations are contemplated. Specifically, although phrases such as "embodiments according to the inventive concept" or similar phrases are used herein, these phrases are intended to refer to embodiment possibilities in general and are not intended to limit the inventive concept to specific embodiments. configuration. As used herein, these terms may refer to the same or different embodiments that may be combined into other embodiments.
前述說明性實施例不應被視為限制本發明概念。儘管已闡述若干實施例,然而熟習此項技術者將易於瞭解,可對該些實施例作出諸多潤飾,而此並不實質上背離本發明的新穎教示內容及優點。因此,所有此類潤飾皆旨在包含於由申請專利範圍所界定的本發明概念的範圍內。 The foregoing illustrative embodiments should not be considered as limiting the inventive concept. Although several embodiments have been described, those skilled in the art will readily appreciate that many modifications may be made to the embodiments without materially departing from the novel teachings and advantages of the invention. Accordingly, all such modifications are intended to be included within the scope of the inventive concept as defined by the claims.
本發明概念的實施例可擴展至以下陳述,且並不限制於此: Embodiments of the inventive concept may be extended to the following statements and are not limited thereto:
陳述1.本發明概念的實施例包括一種轉換編碼器,所述轉換編碼器包括:緩衝器,用以儲存輸入已編碼資料;索引映射器,用以自輸入字典映射至輸出字典;當前編碼緩衝器,用以儲存修改後的當前已編碼資料,所述修改後的當前已編碼資料是回應於所述輸入已編碼資料、所述輸
入字典以及自所述輸入字典至所述輸出字典的映射;先前編碼緩衝器,用以儲存修改後的先前已編碼資料,所述修改後的先前已編碼資料是回應於先前輸入已編碼資料、所述輸入字典以及所述自所述輸入字典至所述輸出字典的映射;以及規則評估器,用以因應於所述當前編碼緩衝器中的所述修改後的當前已編碼資料、所述先前編碼緩衝器中的所述修改後的先前已編碼資料以及轉換編碼規則而生成輸出串流。
陳述2.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述索引映射器是回應於所述轉換編碼規則。
陳述3.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述轉換編碼規則是回應於所述索引映射器。
陳述4.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述索引映射器是回應於所述輸入字典中條目的選定子集。
陳述5.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述規則評估器包括處理器、現場可程式化閘陣列(FPGA)、特殊應用積體電路(ASIC)、圖形處理單元(GPU)或通用GPU(GPGPU)中的至少一者。
陳述6.本發明概念的實施例包括如陳述5所述的轉換編碼器,其中所述規則評估器更包括用以實作所述轉換編碼規則的軟體及列出所述轉換編碼規則的表的儲存器中的至少一者。
Statement 6. Embodiments of the inventive concept include the transform encoder of
陳述7.本發明概念的實施例包括如陳述5所述的轉換編碼
器,其中所述規則評估器更包括用以實作所述轉換編碼規則的電路系統。
Statement 7. Embodiments of the inventive concept include transcoding as described in
陳述8.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述規則評估器運作以使用所述轉換編碼規則自所述輸入已編碼資料生成所述修改後的當前已編碼資料。
Statement 8. An embodiment of the inventive concept includes the transform encoder of
陳述9.本發明概念的實施例包括如陳述8所述的轉換編碼器,其中所述規則評估器運作以將所述修改後的先前已編碼資料添加至所述輸出串流。 Statement 9. An embodiment of the inventive concept includes the transcoder of Statement 8, wherein the rule evaluator operates to add the modified previously encoded material to the output stream.
陳述10.本發明概念的實施例包括如陳述9所述的轉換編碼器,其中所述規則評估器更運作以將所述修改後的當前已編碼資料自所述當前編碼緩衝器移動至所述先前編碼緩衝器中的所述修改後的先前已編碼資料。 Statement 10. Embodiments of the inventive concept include the transform encoder of Statement 9, wherein the rule evaluator is further operative to move the modified currently encoded data from the current encoding buffer to the The modified previously encoded material in the previously encoded buffer.
陳述11.本發明概念的實施例包括如陳述8所述的轉換編碼器,其中所述規則評估器運作以使用所述轉換編碼規則將所述修改後的先前已編碼資料修改成包括所述修改後的當前已編碼資料。 Statement 11. An embodiment of the inventive concept includes the transform encoder of Statement 8, wherein the rule evaluator operates to modify the modified previously encoded material to include the modification using the transform encoding rules The current encoded data after.
陳述12.本發明概念的實施例包括如陳述11所述的轉換編碼器,其中所述規則評估器更運作以在生成所述修改後的當前已編碼資料時將所述輸入已編碼資料的第一編碼方案改變成第二編碼方案。 Statement 12. Embodiments of the inventive concept include the transform encoder of Statement 11, wherein the rule evaluator is further operative to convert a third of the input encoded data when generating the modified currently encoded data. One encoding scheme is changed to a second encoding scheme.
陳述13.本發明概念的實施例包括如陳述11所述的轉換編碼器,其中所述規則評估器更運作以在生成所述修改後的當前已編 碼資料時將所述修改後的先前已編碼資料的第一編碼方案改變成第二編碼方案。 Statement 13. Embodiments of the inventive concept include the transform encoder of Statement 11, wherein the rule evaluator is further operative to generate the modified currently encoded When encoding the data, the modified first encoding scheme of the previously encoded data is changed into a second encoding scheme.
陳述14.本發明概念的實施例包括如陳述8所述的轉換編碼器,其中所述規則評估器運作以自所述輸入已編碼資料確定所述輸入已編碼資料的第一編碼方案,所述第一編碼方案是由所述輸入已編碼資料使用的至少二種編碼方案之一。 Statement 14. An embodiment of the inventive concept includes the transform encoder of Statement 8, wherein the rule evaluator operates to determine a first encoding scheme for the input encoded data from the input encoded data, said The first encoding scheme is one of at least two encoding schemes used by the input encoded data.
陳述15.本發明概念的實施例包括如陳述1所述的轉換編碼器,更包括串流分離器,所述串流分離器用以識別所述輸入已編碼資料中使用第一編碼方案的第一組塊及所述輸入已編碼資料中使用第二編碼方案的第二組塊。
Statement 15. Embodiments of the inventive concept include the transcoder of
陳述16.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述索引映射器運作以將所述輸入字典中的至少一個條目映射至所述輸出字典中的「不理會」值。
Statement 16. An embodiment of the inventive concept includes the transform encoder of
陳述17.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述索引映射器運作以向所述輸出字典添加「不理會」值。
Statement 17. An embodiment of the inventive concept includes the transform encoder of
陳述18.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中:所述輸入已編碼資料是已壓縮的輸入已編碼資料;且所述轉換編碼器更包括解壓縮引擎。
Statement 18. Embodiments of the inventive concept include the transcoder of
陳述19.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述轉換編碼器運作以在不對所述輸入已編碼資料進行
解碼的情況下自所述輸入已編碼資料產生所述輸出串流。
Statement 19. Embodiments of the inventive concept include a transcoder as described in
陳述20.本發明概念的實施例包括如陳述1所述的轉換編碼器,其中所述轉換編碼器包含於固態驅動機(SSD)儲存裝置中。
Statement 20. Embodiments of the inventive concept include the transcoder of
陳述21.本發明概念的實施例包括如陳述20所述的轉換編碼器,其中所述輸入已編碼資料是自所述SSD儲存裝置內的儲存器接收的。 Statement 21. Embodiments of the inventive concept include the transcoder of Statement 20, wherein the input encoded data is received from storage within the SSD storage device.
陳述22.本發明概念的實施例包括一種方法,所述方法包括:在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊;確定主機電腦對所述第一資料組塊感興趣;至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料;在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊;確定主機電腦不感興趣第二資料組塊;至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料;以及將所述第一已編碼資料及所述第二已編碼資料輸出至所述主機電腦。 Statement 22. An embodiment of the inventive concept includes a method comprising: receiving a first data chunk from input encoded data from a storage device at a transcoder; determining a host computer's response to the first data chunk interested; generating first encoded data from the first data chunk based at least in part on the host computer's interest in the first data chunk; receiving from the storage device at the transcoder a second chunk of data from the input encoded data; determining that a host computer is not interested in the second chunk of data; and deriving from the second chunk of data based at least in part on the host computer not being interested in the second chunk of data. Chunking generates second encoded data; and outputs the first encoded data and the second encoded data to the host computer.
陳述23.本發明概念的實施例包括如陳述22所述的方法,其中至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料包括將所述第一已編碼 資料中的值改變成「不理會」值。 Statement 23. Embodiments of the inventive concept include the method of Statement 22, wherein a second data chunk is generated from the second data chunk based at least in part on the host computer not being interested in the second data chunk. Encoding information includes converting said first encoded The values in the data are changed to "ignore" values.
陳述24.本發明概念的實施例包括如陳述23所述的方法,其中至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料更包括將所述第二已編碼資料與包括所述「不理會」值的第三已編碼資料組合。 Statement 24. Embodiments of the inventive concept include the method of Statement 23, wherein a second data chunk is generated from the second data chunk based at least in part on the host computer having no interest in the second data chunk. Encoding the data further includes combining the second encoded data with third encoded data including the "don't care" value.
陳述25.本發明概念的實施例包括如陳述24所述的方法,其中至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料更包括將所述第二資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案。 Statement 25. Embodiments of the inventive concept include the method of Statement 24, wherein a second data chunk is generated from the second data chunk based at least in part on the host computer not being interested in the second data chunk. Encoding the data further includes changing a first encoding scheme of at least one of the second data chunk and the third encoded data to a second encoding scheme.
陳述26.本發明概念的實施例包括如陳述25所述的方法,其中將所述第二資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第二資料組塊的第一編碼方案改變成所述第二已編碼資料中的第二編碼方案。 Statement 26. Embodiments of the inventive concept include the method of Statement 25, wherein the first encoding scheme of at least one of the second chunk of data and the third encoded data is changed to a second encoding The scheme includes changing a first coding scheme of the second chunk of data to a second coding scheme in the second encoded material.
陳述27.本發明概念的實施例包括如陳述25所述的方法,其中將所述第二資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第三已編碼資料的第一編碼方案改變成第二編碼方案。 Statement 27. Embodiments of the inventive concept include the method of Statement 25, wherein the first encoding scheme of at least one of the second chunk of data and the third encoded data is changed to a second encoding The scheme includes changing the first encoding scheme of the third encoded material to a second encoding scheme.
陳述28.本發明概念的實施例包括如陳述22所述的方法,其中至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料包括將所述第一已編碼資料與第三已編碼資料組合。 Statement 28. Embodiments of the inventive concept include the method of Statement 22, wherein a first data chunk is generated from the first data chunk based at least in part on the host computer's interest in the first data chunk. Encoding data includes combining the first encoded data and third encoded data.
陳述29.本發明概念的實施例包括如陳述28所述的方法,其中至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料更包括將所述第一資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案。 Statement 29. Embodiments of the inventive concept include the method of Statement 28, wherein a first data chunk is generated from the first data chunk based at least in part on the host computer's interest in the first data chunk. Encoding the data further includes changing a first encoding scheme of at least one of the first data chunk and the third encoded data to a second encoding scheme.
陳述30.本發明概念的實施例包括如陳述29所述的方法,其中將所述第一資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第一資料組塊的第一編碼方案改變成所述第一已編碼資料中的第二編碼方案。 Statement 30. Embodiments of the inventive concept include the method of Statement 29, wherein a first encoding scheme of at least one of the first chunk of data and the third encoded data is changed to a second encoding The scheme includes changing a first coding scheme of the first chunk of data to a second coding scheme in the first encoded material.
陳述31.本發明概念的實施例包括如陳述29所述的方法,其中將所述第一資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第三已編碼資料的第一編碼方案改變成第二編碼方案。 Statement 31. Embodiments of the inventive concept include the method of Statement 29, wherein a first encoding scheme of at least one of the first chunk of data and the third encoded data is changed to a second encoding The scheme includes changing the first encoding scheme of the third encoded material to a second encoding scheme.
陳述32.本發明概念的實施例包括如陳述22所述的方法,其中至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料包括至少部分地基於轉換編碼規則而自所述第一資料組塊生成所述第一已編碼資料;且至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料包括至少部分地基於所述轉換編碼規則而自所述第二資料組塊生成所述第二已編碼資料。 Statement 32. Embodiments of the inventive concept include the method of Statement 22, wherein a first data chunk is generated from the first data chunk based at least in part on the host computer's interest in the first data chunk. Encoding the material includes generating the first encoded material from the first chunk of data based at least in part on a transformation encoding rule; and generating the first encoded material based at least in part on the host computer having no interest in the second chunk of data. Generating second encoded material from the second chunk of data includes generating the second encoded material from the second chunk of data based at least in part on the transform encoding rules.
陳述33.本發明概念的實施例包括如陳述22所述的方法,其 中在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊包括:在串流分離器處接收所述輸入已編碼資料;由所述串流分離器在所述輸入已編碼資料中識別所述第一資料組塊及所述第二資料組塊,所述第一資料組塊是使用第一編碼方案被編碼,且所述第二資料組塊是使用第二編碼方案被編碼;以及自所述串流分離器接收來自所述輸入已編碼資料的第一資料組塊。 Statement 33. Embodiments of the inventive concept include the method of Statement 22, wherein Receiving the first data chunk from the input encoded data at the conversion encoder from the storage device includes: receiving the input encoded data at a stream demultiplexer; The first data chunk and the second data chunk are identified in the data, the first data chunk is encoded using a first encoding scheme, and the second data chunk is encoded using a second encoding scheme. Encoding; and receiving a first data chunk from the input encoded data from the destreamer.
陳述34.本發明概念的實施例包括如陳述22所述的方法,更包括:自所述儲存裝置接收輸入字典;至少部分地基於所述主機電腦感興趣的資料及所述主機電腦不感興趣的資料而將所述輸入字典映射至輸出字典;以及將所述輸出字典輸出至所述主機電腦。 Statement 34. Embodiments of the inventive concept include the method of Statement 22, further comprising: receiving an input dictionary from the storage device; based at least in part on information that is of interest to the host computer and information that is not of interest to the host computer. data to map the input dictionary to an output dictionary; and output the output dictionary to the host computer.
陳述35.本發明概念的實施例包括如陳述34所述的方法,其中至少部分地基於所述主機電腦感興趣的資料及所述主機電腦不感興趣的資料而將所述輸入字典映射至輸出字典包括至少部分地基於轉換編碼規則而將所述輸入字典映射至輸出字典。 Statement 35. Embodiments of the inventive concept include the method of Statement 34, wherein the input dictionary is mapped to an output dictionary based at least in part on information that is of interest to the host computer and information that is not of interest to the host computer. Includes mapping the input dictionary to an output dictionary based at least in part on a transformation encoding rule.
陳述36.本發明概念的實施例包括如陳述34所述的方法,其中至少部分地基於所述主機電腦感興趣的資料及所述主機電腦不感興趣的資料而將所述輸入字典映射至輸出字典包括至少部分地 基於所述輸入字典中條目的選定子集而將所述輸入字典映射至輸出字典。 Statement 36. Embodiments of the inventive concept include the method of Statement 34, wherein the input dictionary is mapped to an output dictionary based at least in part on information that is of interest to the host computer and information that is not of interest to the host computer. including at least in part The input dictionary is mapped to an output dictionary based on a selected subset of entries in the input dictionary.
陳述37.本發明概念的實施例包括如陳述22所述的方法,其中所述轉換編碼器運作以在不對所述輸入已編碼資料進行解碼的情況下自所述輸入已編碼資料產生所述第一已編碼資料及所述第二已編碼資料。 Statement 37. Embodiments of the inventive concept include the method of Statement 22, wherein the transcoder operates to generate the first encoded data from the input encoded data without decoding the input encoded data. one encoded data and said second encoded data.
陳述38.本發明概念的實施例包括如陳述22所述的方法,其中所述轉換編碼器包含於固態驅動機(SSD)儲存裝置中。 Statement 38. Embodiments of the inventive concept include the method of Statement 22, wherein the transcoder is included in a solid state drive (SSD) storage device.
陳述39.本發明概念的實施例包括如陳述38所述的方法,其中:在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊包括在所述轉換編碼器處自所述SSD儲存裝置內的儲存器接收來自所述輸入已編碼資料的所述第一資料組塊;且在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊包括在所述轉換編碼器處自所述SSD儲存裝置內的所述儲存器接收來自所述輸入已編碼資料的所述第二資料組塊。 Statement 39. Embodiments of the inventive concept include the method of Statement 38, wherein receiving the first data chunk from the input encoded data at the transform encoder from the storage device includes receiving at the transform encoder from the storage device. Storage within the SSD storage device receives the first chunk of data from the input encoded data; and receiving second data from the storage device at the transcoder from the input encoded data Chunking includes receiving, at the transcoder, the second data chunk from the input encoded data from the storage within the SSD storage device.
陳述40.本發明概念的實施例包括一種製品,所述製品包括非暫時性儲存媒體,所述非暫時性儲存媒體上儲存有指令,所述指令在由機器執行時使得:在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊; 確定主機電腦對第一資料組塊感興趣;至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料;在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊;確定所述主機電腦對所述第二資料組塊不感興趣;至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料;以及將所述第一已編碼資料及所述第二已編碼資料輸出至所述主機電腦。 Statement 40. Embodiments of the inventive concept include an article of manufacture including a non-transitory storage medium having instructions stored thereon that, when executed by a machine, cause: at a transcoder receiving a first data chunk from the input encoded data from the storage device; determining that the host computer is interested in the first data chunk; generating first encoded data from the first data chunk based at least in part on the host computer's interest in the first data chunk; in the converting receiving at an encoder a second data chunk from the input encoded data from the storage device; determining that the host computer is not interested in the second data chunk; based at least in part on the host computer's interest in the second data chunk The second data chunk is not of interest and second encoded data is generated from the second data chunk; and the first encoded data and the second encoded data are output to the host computer.
陳述41.本發明概念的實施例包括如陳述40所述的製品,其中至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料包括將所述第一已編碼資料中的值改變成「不理會」值。 Statement 41. An embodiment of the inventive concept includes the article of Statement 40, wherein a second data chunk is generated from the second data chunk based at least in part on the host computer not being interested in the second data chunk. Encoding data includes changing values in the first encoded data to "don't care" values.
陳述42.本發明概念的實施例包括如陳述41所述的製品,其中至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料更包括將所述第二已編碼資料與包括「不理會」值的第三已編碼資料組合。 Statement 42. Embodiments of the inventive concept include the article of Statement 41, wherein a second data chunk is generated from the second data chunk based at least in part on the host computer having no interest in the second data chunk. Encoding the data further includes combining the second encoded data with third encoded data including a "don't care" value.
陳述43.本發明概念的實施例包括如陳述42所述的製品,其中至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料更包括將所述第二資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成 第二編碼方案。 Statement 43. Embodiments of the inventive concept include the article of Statement 42, wherein a second data chunk is generated from the second data chunk based at least in part on the host computer having no interest in the second data chunk. Encoding the data further includes changing the first encoding scheme of at least one of the second data chunk and the third encoded data to Second coding scheme.
陳述44.本發明概念的實施例包括如陳述43所述的製品,其中將所述第二資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第二資料組塊的第一編碼方案改變成所述第二已編碼資料中的第二編碼方案。 Statement 44. Embodiments of the inventive concept include the article of Statement 43, wherein the first encoding scheme of at least one of the second chunk of data and the third encoded data is changed to a second encoding The scheme includes changing a first coding scheme of the second chunk of data to a second coding scheme in the second encoded material.
陳述45.本發明概念的實施例包括如陳述43所述的製品,其中將所述第二資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第三已編碼資料的第一編碼方案改變成第二編碼方案。 Statement 45. Embodiments of the inventive concept include the article of Statement 43, wherein the first encoding scheme of at least one of the second chunk of data and the third encoded data is changed to a second encoding The scheme includes changing the first encoding scheme of the third encoded material to a second encoding scheme.
陳述46.本發明概念的實施例包括如陳述40所述的製品,其中至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料包括將所述第一已編碼資料與第三已編碼資料組合。 Statement 46. An embodiment of the inventive concept includes the article of statement 40, wherein a first data chunk is generated from the first data chunk based at least in part on the host computer's interest in the first data chunk. Encoding data includes combining the first encoded data and third encoded data.
陳述47.本發明概念的實施例包括如陳述46所述的製品,其中至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料更包括將所述第一資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案。 Statement 47. An embodiment of the inventive concept includes the article of Statement 46, wherein a first data chunk is generated from the first data chunk based at least in part on the host computer's interest in the first data chunk. Encoding the data further includes changing a first encoding scheme of at least one of the first data chunk and the third encoded data to a second encoding scheme.
陳述48.本發明概念的實施例包括如陳述47所述的製品,其中將所述第一資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第一資料組塊的第一編碼方案改變成所述第一已編碼資料中的第二編碼方案。 Statement 48. Embodiments of the inventive concept include the article of Statement 47, wherein a first encoding scheme of at least one of the first chunk of data and the third encoded data is changed to a second encoding The scheme includes changing a first coding scheme of the first chunk of data to a second coding scheme in the first encoded material.
陳述49.本發明概念的實施例包括如陳述47所述的製品,其中將所述第一資料組塊及所述第三已編碼資料中的至少一者的第一編碼方案改變成第二編碼方案包括將所述第三已編碼資料的第一編碼方案改變成第二編碼方案。 Statement 49. Embodiments of the inventive concept include the article of Statement 47, wherein a first encoding scheme of at least one of the first chunk of data and the third encoded data is changed to a second encoding The scheme includes changing the first encoding scheme of the third encoded material to a second encoding scheme.
陳述50.本發明概念的實施例包括如陳述40所述的製品,其中至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料包括至少部分地基於轉換編碼規則而自所述第一資料組塊生成所述第一已編碼資料;且至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料包括至少部分地基於所述轉換編碼規則而自所述第二資料組塊生成所述第二已編碼資料。 Statement 50. An embodiment of the inventive concept includes the article of Statement 40, wherein a first data chunk is generated from the first data chunk based at least in part on the host computer's interest in the first data chunk. Encoding the material includes generating the first encoded material from the first chunk of data based at least in part on a transformation encoding rule; and generating the first encoded material based at least in part on the host computer having no interest in the second chunk of data. Generating second encoded material from the second chunk of data includes generating the second encoded material from the second chunk of data based at least in part on the transform encoding rules.
陳述51.本發明概念的實施例包括如陳述40所述的製品,其中在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊包括:在串流分離器處接收所述輸入已編碼資料;由所述串流分離器在所述輸入已編碼資料中識別所述第一資料組塊及所述第二資料組塊,所述第一資料組塊是使用第一編碼方案被編碼,且所述第二資料組塊是使用第二編碼方案被編碼;以及自所述串流分離器接收來自所述輸入已編碼資料的所述第一資料組塊。 Statement 51. Embodiments of the inventive concept include the article of statement 40, wherein receiving the first data chunk from the input encoded data at the transcoder from the storage device includes: receiving at the destreamer the Input encoded data; the stream demultiplexer identifies the first data chunk and the second data chunk in the input encoded data, the first data chunk using a first encoding scheme is encoded, and the second data chunk is encoded using a second encoding scheme; and receiving the first data chunk from the input encoded data from the destreamer.
陳述52.本發明概念的實施例包括如陳述40所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得:自所述儲存裝置接收輸入字典;至少部分地基於所述主機電腦感興趣的資料及所述主機電腦不感興趣的資料而將所述輸入字典映射至輸出字典;以及將所述輸出字典輸出至所述主機電腦。 Statement 52. Embodiments of the inventive concept include the article of statement 40, the non-transitory storage medium having further instructions stored thereon that, when executed by a machine, cause: receiving an input dictionary from the storage device ; Mapping the input dictionary to an output dictionary based at least in part on information that is of interest to the host computer and information that is not of interest to the host computer; and outputting the output dictionary to the host computer.
陳述53.本發明概念的實施例包括如陳述52所述的製品,其中至少部分地基於所述主機電腦感興趣的資料及所述主機電腦不感興趣的資料而將所述輸入字典映射至輸出字典包括至少部分地基於轉換編碼規則而將所述輸入字典映射至輸出字典。 Statement 53. An embodiment of the inventive concept includes the article of Statement 52, wherein the input dictionary is mapped to an output dictionary based at least in part on information that is of interest to the host computer and information that is not of interest to the host computer. Includes mapping the input dictionary to an output dictionary based at least in part on a transformation encoding rule.
陳述54.本發明概念的實施例包括如陳述52所述的製品,其中至少部分地基於所述主機電腦感興趣的資料及所述主機電腦不感興趣的資料而將所述輸入字典映射至輸出字典包括至少部分地基於所述輸入字典中條目的選定子集而將所述輸入字典映射至輸出字典。 Statement 54. An embodiment of the inventive concept includes the article of statement 52, wherein the input dictionary is mapped to an output dictionary based at least in part on information that is of interest to the host computer and information that is not of interest to the host computer. Includes mapping the input dictionary to an output dictionary based at least in part on a selected subset of entries in the input dictionary.
陳述55.本發明概念的實施例包括如陳述40所述的製品,其中所述轉換編碼器運作以在不對所述輸入已編碼資料進行解碼的情況下自所述輸入已編碼資料產生所述第一已編碼資料及所述第二已編碼資料。 Statement 55. Embodiments of the inventive concept include the article of Statement 40, wherein the transcoder operates to generate the first encoded data from the input encoded data without decoding the input encoded data. one encoded data and said second encoded data.
陳述56.本發明概念的實施例包括如陳述40所述的製品,其中所述轉換編碼器包含於固態驅動機(SSD)儲存裝置中。 Statement 56. Embodiments of the inventive concept include the article of statement 40, wherein the transcoder is included in a solid state drive (SSD) storage device.
陳述57.本發明概念的實施例包括如陳述56所述的製品,其中:在轉換編碼器處自儲存裝置接收來自輸入已編碼資料的第一資料組塊包括在所述轉換編碼器處自所述SSD儲存裝置內的儲存器接收來自所述輸入已編碼資料的所述第一資料組塊;且在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊包括在所述轉換編碼器處自所述SSD儲存裝置內的所述儲存器接收來自所述輸入已編碼資料的所述第二資料組塊。 Statement 57. Embodiments of the inventive concept include the article of statement 56, wherein receiving the first data chunk from the input encoded data at the transform encoder from the storage device includes receiving at the transform encoder from the storage device. Storage within the SSD storage device receives the first chunk of data from the input encoded data; and receiving second data from the storage device at the transcoder from the input encoded data Chunking includes receiving, at the transcoder, the second data chunk from the input encoded data from the storage within the SSD storage device.
陳述58.本發明概念的實施例包括一種儲存裝置,所述儲存裝置包括:用於輸入已編碼資料的儲存器;控制器,用以在所述儲存器上處理來自主機電腦的讀取請求及寫入請求;儲存器內計算(ISC)控制器,用以接收源自所述主機電腦的述詞,所述述詞將被應用於儲存於所述儲存器中的所述輸入已編碼資料;以及轉換編碼器,包括索引映射器,所述索引映射器用以自所述輸入已編碼資料的輸入字典映射至輸出字典,所述輸入字典包括至少一個第一條目及至少一個第二條目,所述至少一個第一條目映射至所述輸出字典中的至少一個第三條目,且所述至少一個第二條目映射至所述輸出字典中的「不理會」條目。 Statement 58. Embodiments of the inventive concept include a storage device including: a storage for inputting encoded data; a controller for processing read requests from a host computer on the storage; and a write request; an in-storage computing (ISC) controller to receive a predicate from the host computer that is to be applied to the input encoded data stored in the memory; and a transform encoder comprising an index mapper for mapping from an input dictionary of said input encoded material to an output dictionary, said input dictionary including at least one first entry and at least one second entry, The at least one first entry maps to at least a third entry in the output dictionary, and the at least one second entry maps to a "ignore" entry in the output dictionary.
陳述59.本發明概念的實施例包括如陳述58所述的儲存裝置,其中所述轉換編碼器包括處理器、現場可程式化閘陣列(FPGA)、特殊應用積體電路(ASIC)、圖形處理單元(GPU)或通用GPU(GPGPU)中的至少一者。 Statement 59. An embodiment of the inventive concept includes the storage device of Statement 58, wherein the transcoder includes a processor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), a graphics processing unit At least one of a unit (GPU) or a general-purpose GPU (GPGPU).
陳述60.本發明概念的實施例包括如陳述58所述的儲存裝置,其中所述ISC控制器運作以對來自所述轉換編碼器的輸出已編碼資料應用加速功能。 Statement 60. An embodiment of the inventive concept includes the storage device of Statement 58, wherein the ISC controller operates to apply an acceleration function to output encoded data from the transform encoder.
陳述61.本發明概念的實施例包括如陳述60所述的儲存裝置,其中所述ISC控制器更運作以將來自所述轉換編碼器的對所述輸出已編碼資料的所述加速功能的結果輸出至所述主機電腦。 Statement 61. An embodiment of the inventive concept includes the storage device of Statement 60, wherein the ISC controller is further operative to convert a result of the acceleration function on the output encoded data from the transcoder output to the host computer.
陳述62.本發明概念的實施例包括如陳述58所述的儲存裝置,其中所述ISC控制器運作以將所述轉換編碼器的輸出已編碼資料轉發至所述主機電腦。 Statement 62. An embodiment of the inventive concept includes the storage device of Statement 58, wherein the ISC controller operates to forward output encoded data of the transcoder to the host computer.
陳述63.本發明概念的實施例包括如陳述62所述的儲存裝置,其中所述ISC控制器更運作以將所述輸出字典轉發至所述主機電腦。 Statement 63. An embodiment of the inventive concept includes the storage device of Statement 62, wherein the ISC controller is further operative to forward the output dictionary to the host computer.
陳述64.本發明概念的實施例包括如陳述58所述的儲存裝置,其中所述轉換編碼器運作以至少部分地基於所述輸入已編碼資料及所述自所述輸入字典至所述輸出字典的映射而生成輸出已編碼資料。 Statement 64. An embodiment of the inventive concept includes the storage device of Statement 58, wherein the transform encoder operates to operate based at least in part on the input encoded data and the conversion from the input dictionary to the output dictionary Mapping to generate output encoded data.
陳述65.本發明概念的實施例包括如陳述64所述的儲存裝置,其中所述轉換編碼器包括: 緩衝器,用以儲存所述輸入已編碼資料;所述索引映射器;當前編碼緩衝器,用以儲存修改後的當前已編碼資料,所述修改後的當前已編碼資料是回應於所述輸入已編碼資料及所述自所述輸入字典至所述輸出字典的映射;先前編碼緩衝器,用以儲存修改後的先前已編碼資料,所述修改後的先前已編碼資料是回應於先前輸入已編碼資料及所述自所述輸入字典至所述輸出字典的映射;以及規則評估器,用以因應於所述當前編碼緩衝器中的所述修改後的當前已編碼資料、所述先前編碼緩衝器中的所述修改後的先前已編碼資料以及轉換編碼規則而生成輸出串流。 Statement 65. An embodiment of the inventive concept includes the storage device of Statement 64, wherein the transcoder includes: a buffer for storing the input encoded data; the index mapper; a current encoding buffer for storing modified currently encoded data, the modified currently encoded data being in response to the input Encoded data and the mapping from the input dictionary to the output dictionary; a previously encoded buffer to store modified previously encoded data in response to the previously input Encoded data and the mapping from the input dictionary to the output dictionary; and a rule evaluator to respond to the modified currently encoded data in the current encoding buffer, the previous encoding buffer The modified previously encoded data in the processor and the conversion encoding rules are used to generate an output stream.
陳述66.本發明概念的實施例包括如陳述65所述的儲存裝置,其中所述轉換編碼規則是至少部分地基於所述述詞。 Statement 66. Embodiments of the inventive concept include the storage device of Statement 65, wherein the conversion encoding rules are based at least in part on the predicates.
陳述67.本發明概念的實施例包括如陳述65所述的儲存裝置,其中所述規則評估器在不對所述輸入已編碼資料進行解碼的情況下因應於所述當前編碼緩衝器中的所述修改後的當前已編碼資料、所述先前編碼緩衝器中的所述修改後的先前已編碼資料以及所述轉換編碼規則而生成所述輸出串流。 Statement 67. An embodiment of the inventive concept includes the storage device of Statement 65, wherein the rule evaluator responds to the current encoding buffer without decoding the input encoded data. The modified current encoded data, the modified previously encoded data in the previous encoding buffer, and the conversion encoding rule generate the output stream.
陳述68.本發明概念的實施例包括如陳述64所述的儲存裝置,其中:所述輸入已編碼資料使用第一編碼方案;所述輸出已編碼資料使用第二編碼方案;且 所述第二編碼方案不同於所述第一編碼方案。 Statement 68. An embodiment of the inventive concept includes the storage device of Statement 64, wherein: the input encoded data uses a first encoding scheme; the output encoded data uses a second encoding scheme; and The second encoding scheme is different from the first encoding scheme.
陳述69.本發明概念的實施例包括如陳述58所述的儲存裝置,其中所述輸入已編碼資料以行式格式儲存於所述儲存器中。 Statement 69. An embodiment of the inventive concept includes the storage device of Statement 58, wherein the input encoded data is stored in the memory in a row format.
陳述70.本發明概念的實施例包括如陳述69所述的儲存裝置,其中所述輸入已編碼資料包括使用阿帕奇鑲木地板(Apache Parquet)儲存格式儲存的輸入檔案。 Statement 70. An embodiment of the inventive concept includes the storage device of Statement 69, wherein the input encoded data includes an input file stored using the Apache Parquet storage format.
陳述71.本發明概念的實施例包括如陳述69所述的儲存裝置,更包括行組塊處理器,所述行組塊處理器用以處理包括所述輸入已編碼資料的行組塊並將所述輸入已編碼資料轉發至所述轉換編碼器。 Statement 71. An embodiment of the inventive concept includes the storage device of Statement 69, further comprising a row chunking processor for processing row chunks including the input encoded data and converting the row chunks. The input encoded data is forwarded to the conversion encoder.
陳述72.本發明概念的實施例包括如陳述71所述的儲存裝置,其中所述行組塊處理器包括所述轉換編碼器。 Statement 72. An embodiment of the inventive concept includes the storage device of Statement 71, wherein the row chunking processor includes the transcoder.
陳述73.本發明概念的實施例包括如陳述71所述的儲存裝置,其中所述行組塊處理器包括處理器、現場可程式化閘陣列(FPGA)、特殊應用積體電路(ASIC)、圖形處理單元(GPU)或通用GPU(GPGPU)中的至少一者。 Statement 73. An embodiment of the inventive concept includes the storage device of Statement 71, wherein the row block processor includes a processor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), At least one of a graphics processing unit (GPU) or a general purpose GPU (GPGPU).
陳述74.本發明概念的實施例包括如陳述58所述的儲存裝置,其中所述轉換編碼器運作以至少部分地基於所述述詞而生成欲應用於所述輸入已編碼資料的轉換編碼規則,進而產生輸出已編碼資料。 Statement 74. An embodiment of the inventive concept includes the storage device of Statement 58, wherein the transform encoder operates to generate transform encoding rules to be applied to the input encoded data based at least in part on the predicate , thereby producing output encoded data.
陳述75.本發明概念的實施例包括如陳述74所述的儲存裝置,其中所述轉換編碼器運作以在不對所述輸入已編碼資料進行 解碼的情況下產生所述輸出已編碼資料。 Statement 75. Embodiments of the inventive concept include a storage device as described in Statement 74, wherein the transcoder operates to operate without performing any processing on the input encoded data. Decoding the output produces the encoded data.
陳述76.本發明概念的實施例包括一種方法,所述方法包括:在轉換編碼器處接收欲應用於輸入已編碼資料的述詞;存取所述輸入已編碼資料的輸入字典;識別所述輸入字典中由所述述詞涵蓋的至少一個第一條目及所述輸入字典中未由所述述詞涵蓋的至少一個第二條目;生成輸出字典,所述輸出字典排除所述輸入字典中未由所述述詞涵蓋的所述至少一個第二條目,轉換編碼字典包括至少第三條目及「不理會」條目;以及由所述轉換編碼器生成字典映射,所述字典映射將所述輸入字典中的所述至少一個第一條目映射至所述輸出字典中的所述至少一個第三條目且將所述輸入字典中未由所述述詞涵蓋的所述至少一個第二條目映射至所述輸出字典中的所述「不理會」條目。 Statement 76. An embodiment of the inventive concept includes a method comprising: receiving at a transform encoder a predicate to be applied to input encoded data; accessing an input dictionary of the input encoded data; identifying the at least one first entry in the input dictionary that is covered by the predicate and at least one second entry in the input dictionary that is not covered by the predicate; generating an output dictionary that excludes the input dictionary the at least one second entry not covered by the predicate, the conversion encoding dictionary including at least a third entry and a "ignore" entry; and the conversion encoder generates a dictionary map, the dictionary map converts The at least one first entry in the input dictionary is mapped to the at least one third entry in the output dictionary and the at least one first entry in the input dictionary that is not covered by the predicate Two entries map to the "ignore" entry in the output dictionary.
陳述77.本發明概念的實施例包括如陳述76所述的方法,其中所述輸入已編碼資料是以行式格式儲存。 Statement 77. Embodiments of the inventive concept include the method of Statement 76, wherein the input encoded data is stored in a line format.
陳述78.本發明概念的實施例包括如陳述77所述的方法,其中所述輸入已編碼資料包括使用Apache Parquet儲存格式儲存的輸入檔案。 Statement 78. Embodiments of the inventive concept include the method of Statement 77, wherein the input encoded data includes an input file stored using the Apache Parquet storage format.
陳述79.本發明概念的實施例包括如陳述76所述的方法,其中所述輸入已編碼資料包括以行式格式儲存的行組塊。 Statement 79. Embodiments of the inventive concept include the method of Statement 76, wherein the input encoded data includes row chunks stored in a row format.
陳述80.本發明概念的實施例包括如陳述76所述的方法,更包括: 使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料;以及輸出所述輸出已編碼資料。 Statement 80. Embodiments of the inventive concept include the method of Statement 76, further comprising: Converting the input encoded data into output encoded data using the dictionary mapping; and outputting the output encoded data.
陳述81.本發明概念的實施例包括如陳述80所述的方法,其中使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料包括:在所述轉換編碼器處接收來自所述輸入已編碼資料的第一資料組塊;確定所述第一資料組塊由所述述詞涵蓋;使用所述字典映射,至少部分地基於所述主機電腦電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料;在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊;確定所述第二資料組塊未由所述述詞涵蓋;使用所述字典映射,至少部分地基於所述主機電腦電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料;以及輸出所述第一已編碼資料及所述第二已編碼資料。 Statement 81. Embodiments of the inventive concept include the method of Statement 80, wherein trans-encoding the input encoded material into output encoded material using the dictionary map includes: receiving at the transcoder from the said inputting a first data chunk of encoded data; determining said first data chunk is covered by said predicate; using said dictionary mapping, based at least in part on said host computer computer's mapping of said first data chunk generating first encoded data from the first data chunk of interest; receiving a second data chunk from the input encoded data at the transcoder from the storage device; determining the second a chunk of data is not covered by the predicate; using the dictionary mapping, a second encoded chunk is generated from the second chunk of data based at least in part on the host computer not being interested in the second chunk of data data; and outputting the first encoded data and the second encoded data.
陳述82.本發明概念的實施例包括如陳述81所述的方法,其中在所述轉換編碼器處接收來自所述輸入已編碼資料的第一資料組塊包括: 在行組塊處理器處,自儲存器內計算(ISC)控制器接收區塊識別符(identifier,ID)清單;由所述行組塊處理器存取包括所述區塊ID清單中的區塊ID的行組塊;由所述行組塊處理器自所述行組塊擷取所述輸入已編碼資料;以及將所述輸入已編碼資料自所述行組塊處理器轉發至所述轉換編碼器。 Statement 82. Embodiments of the inventive concept include the method of Statement 81, wherein receiving at the transcoder a first chunk of data from the input encoded data includes: At the row chunking processor, a chunk identifier (ID) list is received from an in-storage compute (ISC) controller; the row chunking processor accesses a region including the chunk ID list. row chunks of block IDs; retrieving the input encoded data from the row chunks by the row chunk processor; and forwarding the input encoded data from the row chunk processor to the Conversion encoder.
陳述83.本發明概念的實施例包括如陳述81所述的方法,更包括至少部分地基於所述述詞來生成欲應用於所述輸入已編碼資料的轉換編碼規則。 Statement 83. Embodiments of the inventive concept include the method of Statement 81, further comprising generating transform encoding rules to be applied to the input encoded data based at least in part on the predicates.
陳述84.本發明概念的實施例包括如陳述80所述的方法,其中使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料包括在不對所述輸入已編碼資料進行解碼的情況下使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料。 Statement 84. Embodiments of the inventive concept include the method of Statement 80, wherein transcoding the input encoded material into output encoded material using the dictionary map includes performing decoding of the input encoded material without decoding the input encoded material. In this case, the dictionary mapping is used to convert and encode the input encoded data into output encoded data.
陳述85.本發明概念的實施例包括如陳述80所述的方法,其中:所述輸入已編碼資料使用第一編碼方案;所述輸出已編碼資料使用第二編碼方案;且所述第二編碼方案不同於所述第一編碼方案。 Statement 85. Embodiments of the inventive concept include the method of Statement 80, wherein: the input encoded material uses a first encoding scheme; the output encoded material uses a second encoding scheme; and the second encoding The scheme is different from the first encoding scheme.
陳述86.本發明概念的實施例包括如陳述80所述的方法,其中輸出所述輸出已編碼資料包括將所述輸出已編碼資料輸出至 ISC控制器。 Statement 86. Embodiments of the inventive concept include the method of Statement 80, wherein outputting the output encoded data includes outputting the output encoded data to ISC controller.
陳述87.本發明概念的實施例包括如陳述86所述的方法,其中將所述輸出已編碼資料輸出至ISC控制器更包括將所述輸出字典輸出至所述ISC控制器。 Statement 87. Embodiments of the inventive concept include the method of Statement 86, wherein outputting the output encoded data to the ISC controller further includes outputting the output dictionary to the ISC controller.
陳述88.本發明概念的實施例包括如陳述87所述的方法,更包括將所述輸出已編碼資料及所述輸出字典自所述ISC控制器轉發至主機電腦。 Statement 88. Embodiments of the inventive concept include the method of Statement 87, further comprising forwarding the output encoded data and the output dictionary from the ISC controller to a host computer.
陳述89.本發明概念的實施例包括如陳述87所述的方法,更包括由所述ISC控制器對所述輸出已編碼資料執行加速功能以產生經加速資料。 Statement 89. Embodiments of the inventive concept include the method of Statement 87, further comprising performing, by the ISC controller, an acceleration function on the output encoded data to generate accelerated data.
陳述90.本發明概念的實施例包括如陳述89所述的方法,更包括將所述經加速資料自所述ISC控制器輸出至主機電腦。 Statement 90. Embodiments of the inventive concept include the method of Statement 89, further comprising outputting the accelerated data from the ISC controller to a host computer.
陳述91.本發明概念的實施例包括如陳述76所述的方法,更包括輸出所述輸出字典。 Statement 91. Embodiments of the inventive concept include the method of Statement 76, further comprising outputting the output dictionary.
陳述92.本發明概念的實施例包括如陳述76所述的方法,其中接收欲應用於輸入已編碼資料的述詞包括自ISC控制器接收欲應用於所述輸入已編碼資料的所述述詞。 Statement 92. Embodiments of the inventive concept include the method of Statement 76, wherein receiving a predicate to be applied to input encoded data includes receiving the predicate to be applied to the input encoded data from an ISC controller .
陳述93.本發明概念的實施例包括如陳述92所述的方法,更包括自所述ISC控制器接收所述輸入字典。 Statement 93. Embodiments of the inventive concept include the method of Statement 92, further comprising receiving the input dictionary from the ISC controller.
陳述94.本發明概念的實施例包括如陳述76所述的方法,更包括:確定所述輸入字典中不存在未由所述述詞涵蓋的條目;以及 在不將所述輸入已編碼資料轉換編碼成輸出已編碼資料的情況下輸出所述輸入已編碼資料。 Statement 94. Embodiments of the inventive concept include the method of Statement 76, further comprising: determining that there are no entries in the input dictionary that are not covered by the predicate; and The input encoded data is output without converting the encoding into output encoded data.
陳述95.本發明概念的實施例包括一種製品,所述製品包括非暫時性儲存媒體,所述非暫時性儲存媒體上儲存有指令,所述指令在由機器執行時使得:在轉換編碼器處接收欲應用於輸入已編碼資料的述詞;存取所述輸入已編碼資料的輸入字典;識別所述輸入字典中由所述述詞涵蓋的至少一個第一條目及所述輸入字典中未由所述述詞涵蓋的至少一個第二條目;生成輸出字典,所述輸出字典排除所述輸入字典中未由所述述詞涵蓋的所述至少一個第二條目,轉換編碼字典包括至少第三條目及「不理會」條目;以及由所述轉換編碼器生成字典映射,所述字典映射將所述輸入字典中的所述至少一個第一條目映射至所述輸出字典中的所述至少一個第三條目且將所述輸入字典中未由所述述詞涵蓋的所述至少一個第二條目映射至所述輸出字典中的所述「不理會」條目。 Statement 95. An embodiment of the inventive concept includes an article comprising a non-transitory storage medium having instructions stored thereon that, when executed by a machine, cause: at a transcoder receiving a predicate to be applied to the input encoded data; accessing an input dictionary of the input encoded data; identifying at least a first entry in the input dictionary covered by the predicate and an entry in the input dictionary that is not at least one second entry covered by the predicate; generating an output dictionary excluding the at least one second entry in the input dictionary that is not covered by the predicate, converting the encoding dictionary including at least The third entry and the "ignore" entry; and generating, by the transform encoder, a dictionary map mapping the at least one first entry in the input dictionary to all entries in the output dictionary. and mapping the at least one second entry in the input dictionary that is not covered by the predicate to the "ignore" entry in the output dictionary.
陳述96.本發明概念的實施例包括如陳述95所述的製品,其中所述輸入已編碼資料是以行式格式儲存。 Statement 96. Embodiments of the inventive concept include the article of statement 95, wherein the input encoded data is stored in a line format.
陳述97.本發明概念的實施例包括如陳述96所述的製品,其中所述輸入已編碼資料包括使用Apache Parquet儲存格式儲存的輸入檔案。 Statement 97. Embodiments of the inventive concept include the article of statement 96, wherein the input encoded data includes an input file stored using the Apache Parquet storage format.
陳述98.本發明概念的實施例包括如陳述95所述的製品,其 中所述輸入已編碼資料包括以行式格式儲存的行組塊。 Statement 98. Embodiments of the inventive concept include the article of statement 95, wherein The input encoded data described in includes row chunks stored in row format.
陳述99.本發明概念的實施例包括如陳述95所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得:使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料;以及輸出所述輸出已編碼資料。 Statement 99. Embodiments of the inventive concept include the article of statement 95, the non-transitory storage medium having further instructions stored thereon that, when executed by a machine, cause: using the dictionary map to convert the converting input encoded data into output encoded data; and outputting the output encoded data.
陳述100.本發明概念的實施例包括如陳述99所述的製品,其中使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料包括:在所述轉換編碼器處接收來自所述輸入已編碼資料的第一資料組塊;確定所述第一資料組塊由所述述詞涵蓋;使用所述字典映射,至少部分地基於所述主機電腦對所述第一資料組塊感興趣而自所述第一資料組塊生成第一已編碼資料;在所述轉換編碼器處自所述儲存裝置接收來自所述輸入已編碼資料的第二資料組塊;確定所述第二資料組塊未由所述述詞涵蓋;使用所述字典映射,至少部分地基於所述主機電腦對所述第二資料組塊不感興趣而自所述第二資料組塊生成第二已編碼資料;以及輸出所述第一已編碼資料及所述第二已編碼資料。 Statement 100. Embodiments of the inventive concept include the article of statement 99, wherein transcoding the input encoded material into output encoded material using the dictionary map includes: receiving at the transcoder from the Describing inputting a first data chunk of encoded data; determining that the first data chunk is covered by the predicate; using the dictionary mapping, based at least in part on the host computer's sense of the first data chunk generating first encoded data from the first data chunk; receiving at the transcoder from the storage device a second data chunk from the input encoded data; determining the second data a chunk is not covered by the predicate; generating second encoded material from the second chunk of data based at least in part on the host computer not being interested in the second chunk of data using the dictionary mapping; and outputting the first encoded data and the second encoded data.
陳述101.本發明概念的實施例包括如陳述100所述的製品,其中在所述轉換編碼器處接收來自所述輸入已編碼資料的第一資料組塊包括:在行組塊處理器處,自儲存器內計算(ISC)控制器接收區塊識別符(ID)清單;由所述行組塊處理器存取包括區塊ID清單中的區塊ID的行組塊;由所述行組塊處理器自所述行組塊擷取所述輸入已編碼資料;以及將所述輸入已編碼資料自所述行組塊處理器轉發至所述轉換編碼器。 Statement 101. Embodiments of the inventive concept include the article of statement 100, wherein receiving at the transform encoder a first chunk of data from the input encoded data includes: at a row chunk processor, receiving a list of block identifiers (IDs) from an in-storage compute (ISC) controller; accessing, by the row chunk processor, a row chunk that includes a chunk ID in the list of row chunks; by the row chunk processor A block processor retrieves the input encoded data from the row chunks; and forwards the input encoded data from the row chunk processor to the transform encoder.
陳述102.本發明概念的實施例包括如陳述100所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得至少部分地基於所述述詞而生成欲應用於所述輸入已編碼資料的轉換編碼規則。 Statement 102. Embodiments of the inventive concept include the article of statement 100, the non-transitory storage medium having further instructions stored thereon that, when executed by a machine, cause Transform encoding rules are generated to be applied to the input encoded data.
陳述103.本發明概念的實施例包括如陳述99所述的製品,其中使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料包括在不對所述輸入已編碼資料進行解碼的情況下使用所述字典映射將所述輸入已編碼資料轉換編碼成輸出已編碼資料。 Statement 103. Embodiments of the inventive concept include the article of statement 99, wherein using the dictionary mapping to trans-encode the input encoded material into output encoded material includes without decoding the input encoded material. In this case, the dictionary mapping is used to convert and encode the input encoded data into output encoded data.
陳述104.本發明概念的實施例包括如陳述99所述的製品,其中:所述輸入已編碼資料使用第一編碼方案; 所述輸出已編碼資料使用第二編碼方案;且所述第二編碼方案不同於所述第一編碼方案。 Statement 104. Embodiments of the inventive concept include the article of statement 99, wherein: the input encoded material uses a first encoding scheme; The output encoded data uses a second encoding scheme; and the second encoding scheme is different from the first encoding scheme.
陳述105.本發明概念的實施例包括如陳述99所述的製品,其中輸出所述輸出已編碼資料包括將所述輸出已編碼資料輸出至ISC控制器。
陳述106.本發明概念的實施例包括如陳述105所述的製品,其中將所述輸出已編碼資料輸出至ISC控制器更包括將所述輸出字典輸出至所述ISC控制器。
Statement 106. Embodiments of the inventive concept include the article of
陳述107.本發明概念的實施例包括如陳述106所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得將所述輸出已編碼資料及所述輸出字典自所述ISC控制器轉發至主機電腦。 Statement 107. Embodiments of the inventive concept include the article of statement 106, the non-transitory storage medium having further instructions stored thereon that, when executed by a machine, cause the output encoded data and the The output dictionary is forwarded from the ISC controller to the host computer.
陳述108.本發明概念的實施例包括如陳述106所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得由所述ISC控制器對所述輸出已編碼資料執行加速功能以產生經加速資料。 Statement 108. Embodiments of the inventive concept include the article of statement 106, the non-transitory storage medium having further instructions stored thereon that, when executed by a machine, cause the ISC controller to Outputting the encoded data performs an acceleration function to produce accelerated data.
陳述109.本發明概念的實施例包括如陳述108所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得將所述經加速資料自所述ISC控制器輸出至主機電腦。 Statement 109. Embodiments of the inventive concept include the article of statement 108, the non-transitory storage medium having further instructions stored thereon that, when executed by a machine, cause the accelerated data to be transferred from the The ISC controller outputs to the host computer.
陳述110.本發明概念的實施例包括如陳述95所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器
執行時使得輸出所述輸出字典。
陳述111.本發明概念的實施例包括如陳述95所述的製品,其中接收欲應用於輸入已編碼資料的述詞包括自ISC控制器接收欲應用於所述輸入已編碼資料的所述述詞。 Statement 111. Embodiments of the inventive concept include the article of Statement 95, wherein receiving a predicate to be applied to input encoded data includes receiving from an ISC controller the predicate to be applied to the input encoded data. .
陳述112.本發明概念的實施例包括如陳述111所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得自所述ISC控制器接收所述輸入字典。 Statement 112. Embodiments of the inventive concept include the article of statement 111, the non-transitory storage medium having further instructions stored thereon, the instructions, when executed by a machine, cause receiving the Enter dictionary.
陳述113.本發明概念的實施例包括如陳述95所述的製品,所述非暫時性儲存媒體上儲存有進一步的指令,所述指令在由機器執行時使得:確定所述輸入字典中不存在未由所述述詞涵蓋的條目;以及在不將所述輸入已編碼資料轉換編碼成輸出已編碼資料的情況下輸出所述輸入已編碼資料。 Statement 113. Embodiments of the inventive concept include the article of Statement 95, the non-transitory storage medium having further instructions stored on the non-transitory storage medium, the instructions, when executed by a machine, cause: determining that the input dictionary does not exist items not covered by the predicate; and outputting the input encoded data without converting the encoding into output encoded data.
因此,鑒於本文中所述的實施例的各種變更,此詳細說明及隨附材料僅旨在為說明性的,且不應被視為限制本發明概念的範圍。因此,作為本發明概念所主張的申請專利範圍是上述所有修改,而上述修改可歸屬於以下申請專利範圍及其等效形式的範圍及精神內。 Therefore, this detailed description and accompanying materials are intended to be illustrative only and should not be construed as limiting the scope of the inventive concept in view of various modifications of the embodiments described herein. Therefore, the patentable scope claimed as the concept of the present invention is all the above modifications, and the above modifications can be attributed to the scope and spirit of the following patentable scope and its equivalents.
420:轉換編碼器 420:Convert encoder
605:循環緩衝器/緩衝器 605: Circular buffer/buffer
610:串流分離器 610:Stream splitter
615:索引映射器 615:Index mapper
620:當前編碼緩衝器 620: Current encoding buffer
625:先前編碼緩衝器 625: Previous encoding buffer
630:轉換編碼規則 630:Conversion encoding rules
635:規則評估器 635: Rule evaluator
Claims (20)
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962834900P | 2019-04-16 | 2019-04-16 | |
US62/834,900 | 2019-04-16 | ||
US201962945883P | 2019-12-09 | 2019-12-09 | |
US201962945877P | 2019-12-09 | 2019-12-09 | |
US62/945,883 | 2019-12-09 | ||
US62/945,877 | 2019-12-09 | ||
US16/820,665 | 2020-03-16 | ||
US16/820,665 US11139827B2 (en) | 2019-03-15 | 2020-03-16 | Conditional transcoding for encoded data |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202107856A TW202107856A (en) | 2021-02-16 |
TWI825305B true TWI825305B (en) | 2023-12-11 |
Family
ID=72913839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109112659A TWI825305B (en) | 2019-04-16 | 2020-04-15 | Transcoder and method and article for transcoding |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP7381393B2 (en) |
KR (2) | KR20200121760A (en) |
CN (1) | CN111832257B (en) |
TW (1) | TWI825305B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11791838B2 (en) * | 2021-01-15 | 2023-10-17 | Samsung Electronics Co., Ltd. | Near-storage acceleration of dictionary decoding |
CN115719059B (en) * | 2022-11-29 | 2023-08-08 | 北京中科智加科技有限公司 | Morse grouping error correction method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5861827A (en) * | 1996-07-24 | 1999-01-19 | Unisys Corporation | Data compression and decompression system with immediate dictionary updating interleaved with string search |
TW410311B (en) * | 1994-09-30 | 2000-11-01 | Ricoh Kk | Method and apparatus for encoding and decoding data |
US20030138158A1 (en) * | 1994-09-21 | 2003-07-24 | Schwartz Edward L. | Multiple coder technique |
TW200844798A (en) * | 2007-04-30 | 2008-11-16 | Jen-Te Chen | Decoding method utilizing temporally ambiguous code and apparatus using the same |
US20170155403A1 (en) * | 2014-10-07 | 2017-06-01 | Doron Kletter | Enhanced data compression for sparse multidimensional ordered series data |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1821414B1 (en) * | 2004-12-07 | 2016-06-22 | Nippon Telegraph And Telephone Corporation | Information compression-coding device, method thereof, program thereof and recording medium storing the program |
US7102552B1 (en) * | 2005-06-07 | 2006-09-05 | Windspring, Inc. | Data compression with edit-in-place capability for compressed data |
JP4266218B2 (en) | 2005-09-29 | 2009-05-20 | 株式会社東芝 | Recompression encoding method, apparatus, and program for moving image data |
US8090027B2 (en) * | 2007-08-29 | 2012-01-03 | Red Hat, Inc. | Data compression using an arbitrary-sized dictionary |
US7889102B2 (en) * | 2009-02-26 | 2011-02-15 | Red Hat, Inc. | LZSS with multiple dictionaries and windows |
US8159374B2 (en) * | 2009-11-30 | 2012-04-17 | Red Hat, Inc. | Unicode-compatible dictionary compression |
US9779071B2 (en) * | 2015-07-13 | 2017-10-03 | Fujitsu Limited | Non-transitory computer-readable recording medium, encoding method, encoding apparatus, decoding method, and decoding apparatus |
JP2017028372A (en) | 2015-07-16 | 2017-02-02 | 沖電気工業株式会社 | Coding scheme conversion device, method and program |
CN108197087B (en) * | 2018-01-18 | 2021-11-16 | 奇安信科技集团股份有限公司 | Character code recognition method and device |
-
2020
- 2020-04-15 TW TW109112659A patent/TWI825305B/en active
- 2020-04-16 CN CN202010298627.5A patent/CN111832257B/en active Active
- 2020-04-16 KR KR1020200046249A patent/KR20200121760A/en not_active Ceased
- 2020-04-16 JP JP2020073662A patent/JP7381393B2/en active Active
-
2024
- 2024-05-20 KR KR1020240065311A patent/KR20240078422A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030138158A1 (en) * | 1994-09-21 | 2003-07-24 | Schwartz Edward L. | Multiple coder technique |
TW410311B (en) * | 1994-09-30 | 2000-11-01 | Ricoh Kk | Method and apparatus for encoding and decoding data |
US5861827A (en) * | 1996-07-24 | 1999-01-19 | Unisys Corporation | Data compression and decompression system with immediate dictionary updating interleaved with string search |
TW200844798A (en) * | 2007-04-30 | 2008-11-16 | Jen-Te Chen | Decoding method utilizing temporally ambiguous code and apparatus using the same |
US20170155403A1 (en) * | 2014-10-07 | 2017-06-01 | Doron Kletter | Enhanced data compression for sparse multidimensional ordered series data |
Also Published As
Publication number | Publication date |
---|---|
JP7381393B2 (en) | 2023-11-15 |
CN111832257B (en) | 2023-02-28 |
CN111832257A (en) | 2020-10-27 |
TW202107856A (en) | 2021-02-16 |
JP2020178347A (en) | 2020-10-29 |
KR20240078422A (en) | 2024-06-03 |
KR20200121761A (en) | 2020-10-26 |
KR20200121760A (en) | 2020-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240088914A1 (en) | Using predicates in conditional transcoder for column store | |
US10187081B1 (en) | Dictionary preload for data compression | |
US8578058B2 (en) | Real-time multi-block lossless recompression | |
Mahdi et al. | Implementing a novel approach an convert audio compression to text coding via hybrid technique | |
US9252807B2 (en) | Efficient one-pass cache-aware compression | |
US8937564B2 (en) | System, method and non-transitory computer readable medium for compressing genetic information | |
CN107565971B (en) | A data compression method and device | |
US9479194B2 (en) | Data compression apparatus and data decompression apparatus | |
TWI789392B (en) | Lossless reduction of data by using a prime data sieve and performing multidimensional search and content-associative retrieval on data that has been losslessly reduced using a prime data sieve | |
KR20240078422A (en) | Conditional transcoding for encoded data | |
US8407378B2 (en) | High-speed inline data compression inline with an eight byte data path | |
CN103326732A (en) | Method for packing data, method for unpacking data, coder and decoder | |
CN103729429A (en) | A Compression Method Based on HBase | |
CN114567331B (en) | LZ 77-based compression method, device and medium thereof | |
KR102863409B1 (en) | Using predicates in conditional transcoder for column store | |
TW202311996A (en) | Systems and method for data compressing | |
CN115705161A (en) | System, method and apparatus for partitioning and encrypting data | |
Kella et al. | Apcfs: Autonomous and parallel compressed file system | |
US12380072B1 (en) | Method and system for performing a compaction/merge job using a merge based tile architecture | |
US20140236897A1 (en) | System, method and non-transitory computer readable medium for compressing genetic information | |
CN120238134A (en) | Data compression method and data decompression method |