GB201311399D0 - Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product - Google Patents
Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program productInfo
- Publication number
- GB201311399D0 GB201311399D0 GBGB1311399.8A GB201311399A GB201311399D0 GB 201311399 D0 GB201311399 D0 GB 201311399D0 GB 201311399 A GB201311399 A GB 201311399A GB 201311399 D0 GB201311399 D0 GB 201311399D0
- Authority
- GB
- United Kingdom
- Prior art keywords
- column
- data processing
- data
- oriented manner
- columns
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/46—Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Data stored in a column-oriented manner is encoded using a data mining algorithm for finding column patterns among a set of data tuples, where each data tuple contains a set of columns, and the data mining algorithm treats all columns and all column combinations and column ordering similarly or in the same manner when looking for column patterns. Column values are ordered occurring in the column patterns based on their frequencies into a prefix tree, where the prefix tree defines a pattern order. The data tuples are sorted according to the pattern order, resulting in sorted data tuples, and columns of the sorted data tuples are encoded using run-length encoding.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP10193677 | 2010-12-03 | ||
| PCT/EP2011/069324 WO2012072364A1 (en) | 2010-12-03 | 2011-11-03 | Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| GB201311399D0 true GB201311399D0 (en) | 2013-08-14 |
| GB2500532A GB2500532A (en) | 2013-09-25 |
| GB2500532B GB2500532B (en) | 2018-02-21 |
Family
ID=44907874
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB1311399.8A Active GB2500532B (en) | 2010-12-03 | 2011-11-03 | Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US9325344B2 (en) |
| DE (1) | DE112011104005T5 (en) |
| GB (1) | GB2500532B (en) |
| WO (1) | WO2012072364A1 (en) |
Families Citing this family (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8805811B2 (en) * | 2012-04-30 | 2014-08-12 | Hewlett-Packard Development Company, L.P. | Executing user-defined function on a plurality of database tuples |
| JP5826114B2 (en) * | 2012-05-25 | 2015-12-02 | クラリオン株式会社 | Data decompression device, data compression device, data decompression program, data compression program, and compressed data distribution system |
| US20140173033A1 (en) * | 2012-12-17 | 2014-06-19 | Salesforce.Com, Inc. | System, method and computer program product for processing data in a dynamic and generic manner |
| US10019457B1 (en) | 2013-01-22 | 2018-07-10 | Amazon Technologies, Inc. | Multi-level compression for storing data in a data store |
| US9195711B2 (en) | 2013-03-11 | 2015-11-24 | International Business Machines Corporation | Persisting and retrieving arbitrary slices of nested structures using a column-oriented data store |
| US9384204B2 (en) | 2013-05-22 | 2016-07-05 | Amazon Technologies, Inc. | Efficient data compression and analysis as a service |
| KR101522870B1 (en) * | 2013-10-01 | 2015-05-26 | 주식회사 파수닷컴 | Apparatus and method for encrypting data column |
| GB201322057D0 (en) * | 2013-12-13 | 2014-01-29 | Qatar Foundation | Descriptive and prescriptive data cleaning |
| GB201409214D0 (en) * | 2014-05-23 | 2014-07-09 | Ibm | A method and system for processing a data set |
| US10303685B2 (en) * | 2015-06-08 | 2019-05-28 | International Business Machines Corporation | Data table performance optimization |
| US10235100B2 (en) * | 2016-08-23 | 2019-03-19 | Sap Se | Optimizing column based database table compression |
| US10515092B2 (en) * | 2017-07-21 | 2019-12-24 | Google Llc | Structured record compression and retrieval |
| US20190050436A1 (en) * | 2017-08-14 | 2019-02-14 | International Business Machines Corporation | Content-based predictive organization of column families |
| US11755927B2 (en) * | 2019-08-23 | 2023-09-12 | Bank Of America Corporation | Identifying entitlement rules based on a frequent pattern tree |
| US11386111B1 (en) * | 2020-02-11 | 2022-07-12 | Massachusetts Mutual Life Insurance Company | Systems, devices, and methods for data analytics |
| CN112347104B (en) * | 2020-11-06 | 2023-09-29 | 中国人民大学 | Column storage layout optimization method based on deep reinforcement learning |
| US20220350802A1 (en) * | 2021-04-29 | 2022-11-03 | International Business Machines Corporation | Query performance |
| CN115167755A (en) * | 2022-05-05 | 2022-10-11 | 山东大学 | Ethernet workshop data storage method and system based on shared prefix |
| US12306812B2 (en) * | 2023-10-27 | 2025-05-20 | International Business Machines Corporation | In-database data cleansing and independent store of clean data |
| US12360969B2 (en) | 2023-10-27 | 2025-07-15 | International Business Machines Corporation | In-database data cleansing |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001086577A2 (en) * | 2000-05-10 | 2001-11-15 | E. I. Du Pont De Nemours And Company | Method of discovering patterns in symbol sequences |
| US7024414B2 (en) | 2001-08-06 | 2006-04-04 | Sensage, Inc. | Storage of row-column data |
| US7539685B2 (en) | 2003-12-30 | 2009-05-26 | Microsoft Corporation | Index key normalization |
| US7496589B1 (en) | 2005-07-09 | 2009-02-24 | Google Inc. | Highly compressed randomly accessed storage of large tables with arbitrary columns |
| US9195695B2 (en) * | 2006-09-15 | 2015-11-24 | Ibm International Group B.V. | Technique for compressing columns of data |
| US8266147B2 (en) | 2006-09-18 | 2012-09-11 | Infobright, Inc. | Methods and systems for database organization |
| US7730106B2 (en) | 2006-12-28 | 2010-06-01 | Teradata Us, Inc. | Compression of encrypted data in database management systems |
| US7769729B2 (en) | 2007-05-21 | 2010-08-03 | Sap Ag | Block compression of tables with repeated values |
| US8356060B2 (en) * | 2009-04-30 | 2013-01-15 | Oracle International Corporation | Compression analyzer |
-
2011
- 2011-08-10 US US13/206,827 patent/US9325344B2/en active Active
- 2011-11-03 GB GB1311399.8A patent/GB2500532B/en active Active
- 2011-11-03 WO PCT/EP2011/069324 patent/WO2012072364A1/en not_active Ceased
- 2011-11-03 DE DE112011104005T patent/DE112011104005T5/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| US20120143913A1 (en) | 2012-06-07 |
| GB2500532A (en) | 2013-09-25 |
| WO2012072364A1 (en) | 2012-06-07 |
| GB2500532B (en) | 2018-02-21 |
| DE112011104005T5 (en) | 2013-08-29 |
| US9325344B2 (en) | 2016-04-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| GB201311399D0 (en) | Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product | |
| WO2013001535A3 (en) | System, method and data structure for fast loading, storing and access to huge data sets in real time | |
| WO2013090233A8 (en) | Distributed computing in a distributed storage and task network | |
| IN2014DN09960A (en) | ||
| Maruyama et al. | ESP-index: A compressed index based on edit-sensitive parsing | |
| GB2515938A (en) | A multi-layer system for symbol-space based compression of patterns | |
| IN2013CH04496A (en) | ||
| MX2021006632A (en) | Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device. | |
| WO2013016679A3 (en) | Systems and methods for generating and using a digital pass | |
| GB2540298A (en) | Signature retrieval and matching for media monitoring | |
| WO2011077300A3 (en) | Processing of geological data | |
| WO2014043366A3 (en) | Optimal data representation and auxiliary structures for in-memory database query processing | |
| WO2014144833A3 (en) | Taste profile attributes | |
| PH12019500795A1 (en) | Method and system for the transmission of bioinformatics data | |
| WO2014188290A3 (en) | Fast and secure retrieval of dna sequences | |
| EP3242227A4 (en) | Page querying method and data processing node in oltp cluster database | |
| IL241640B (en) | Method for executing queries on streaming data using graphic processing units | |
| WO2015044442A3 (en) | Method for generating a sequence of binary code words of a multi-bit code for a control signal for a consumer | |
| MY182481A (en) | Transmitter and shortening method thereof | |
| GB2562352A (en) | Post-decoding error check with diagnostics for product codes | |
| WO2013134662A3 (en) | Systems and methods for creating a temporal content profile | |
| IL230741B (en) | Systems and methods for keyword spotting using alternating search algorithms | |
| Wei et al. | Robust forecast combinations | |
| Couceiro et al. | Decompositions of functions based on arity gap | |
| WO2011133302A3 (en) | Multi-threaded sort of data items in spreadsheet tables |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 746 | Register noted 'licences of right' (sect. 46/1977) |
Effective date: 20180306 |