[go: up one dir, main page]

GB201311399D0 - Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product - Google Patents

Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product

Info

Publication number
GB201311399D0
GB201311399D0 GBGB1311399.8A GB201311399A GB201311399D0 GB 201311399 D0 GB201311399 D0 GB 201311399D0 GB 201311399 A GB201311399 A GB 201311399A GB 201311399 D0 GB201311399 D0 GB 201311399D0
Authority
GB
United Kingdom
Prior art keywords
column
data processing
data
oriented manner
columns
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GBGB1311399.8A
Other versions
GB2500532A (en
GB2500532B (en
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB201311399D0 publication Critical patent/GB201311399D0/en
Publication of GB2500532A publication Critical patent/GB2500532A/en
Application granted granted Critical
Publication of GB2500532B publication Critical patent/GB2500532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/46Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Data stored in a column-oriented manner is encoded using a data mining algorithm for finding column patterns among a set of data tuples, where each data tuple contains a set of columns, and the data mining algorithm treats all columns and all column combinations and column ordering similarly or in the same manner when looking for column patterns. Column values are ordered occurring in the column patterns based on their frequencies into a prefix tree, where the prefix tree defines a pattern order. The data tuples are sorted according to the pattern order, resulting in sorted data tuples, and columns of the sorted data tuples are encoded using run-length encoding.
GB1311399.8A 2010-12-03 2011-11-03 Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product Active GB2500532B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP10193677 2010-12-03
PCT/EP2011/069324 WO2012072364A1 (en) 2010-12-03 2011-11-03 Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product

Publications (3)

Publication Number Publication Date
GB201311399D0 true GB201311399D0 (en) 2013-08-14
GB2500532A GB2500532A (en) 2013-09-25
GB2500532B GB2500532B (en) 2018-02-21

Family

ID=44907874

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1311399.8A Active GB2500532B (en) 2010-12-03 2011-11-03 Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product

Country Status (4)

Country Link
US (1) US9325344B2 (en)
DE (1) DE112011104005T5 (en)
GB (1) GB2500532B (en)
WO (1) WO2012072364A1 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8805811B2 (en) * 2012-04-30 2014-08-12 Hewlett-Packard Development Company, L.P. Executing user-defined function on a plurality of database tuples
JP5826114B2 (en) * 2012-05-25 2015-12-02 クラリオン株式会社 Data decompression device, data compression device, data decompression program, data compression program, and compressed data distribution system
US20140173033A1 (en) * 2012-12-17 2014-06-19 Salesforce.Com, Inc. System, method and computer program product for processing data in a dynamic and generic manner
US10019457B1 (en) 2013-01-22 2018-07-10 Amazon Technologies, Inc. Multi-level compression for storing data in a data store
US9195711B2 (en) 2013-03-11 2015-11-24 International Business Machines Corporation Persisting and retrieving arbitrary slices of nested structures using a column-oriented data store
US9384204B2 (en) 2013-05-22 2016-07-05 Amazon Technologies, Inc. Efficient data compression and analysis as a service
KR101522870B1 (en) * 2013-10-01 2015-05-26 주식회사 파수닷컴 Apparatus and method for encrypting data column
GB201322057D0 (en) * 2013-12-13 2014-01-29 Qatar Foundation Descriptive and prescriptive data cleaning
GB201409214D0 (en) * 2014-05-23 2014-07-09 Ibm A method and system for processing a data set
US10303685B2 (en) * 2015-06-08 2019-05-28 International Business Machines Corporation Data table performance optimization
US10235100B2 (en) * 2016-08-23 2019-03-19 Sap Se Optimizing column based database table compression
US10515092B2 (en) * 2017-07-21 2019-12-24 Google Llc Structured record compression and retrieval
US20190050436A1 (en) * 2017-08-14 2019-02-14 International Business Machines Corporation Content-based predictive organization of column families
US11755927B2 (en) * 2019-08-23 2023-09-12 Bank Of America Corporation Identifying entitlement rules based on a frequent pattern tree
US11386111B1 (en) * 2020-02-11 2022-07-12 Massachusetts Mutual Life Insurance Company Systems, devices, and methods for data analytics
CN112347104B (en) * 2020-11-06 2023-09-29 中国人民大学 Column storage layout optimization method based on deep reinforcement learning
US20220350802A1 (en) * 2021-04-29 2022-11-03 International Business Machines Corporation Query performance
CN115167755A (en) * 2022-05-05 2022-10-11 山东大学 Ethernet workshop data storage method and system based on shared prefix
US12306812B2 (en) * 2023-10-27 2025-05-20 International Business Machines Corporation In-database data cleansing and independent store of clean data
US12360969B2 (en) 2023-10-27 2025-07-15 International Business Machines Corporation In-database data cleansing

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001086577A2 (en) * 2000-05-10 2001-11-15 E. I. Du Pont De Nemours And Company Method of discovering patterns in symbol sequences
US7024414B2 (en) 2001-08-06 2006-04-04 Sensage, Inc. Storage of row-column data
US7539685B2 (en) 2003-12-30 2009-05-26 Microsoft Corporation Index key normalization
US7496589B1 (en) 2005-07-09 2009-02-24 Google Inc. Highly compressed randomly accessed storage of large tables with arbitrary columns
US9195695B2 (en) * 2006-09-15 2015-11-24 Ibm International Group B.V. Technique for compressing columns of data
US8266147B2 (en) 2006-09-18 2012-09-11 Infobright, Inc. Methods and systems for database organization
US7730106B2 (en) 2006-12-28 2010-06-01 Teradata Us, Inc. Compression of encrypted data in database management systems
US7769729B2 (en) 2007-05-21 2010-08-03 Sap Ag Block compression of tables with repeated values
US8356060B2 (en) * 2009-04-30 2013-01-15 Oracle International Corporation Compression analyzer

Also Published As

Publication number Publication date
US20120143913A1 (en) 2012-06-07
GB2500532A (en) 2013-09-25
WO2012072364A1 (en) 2012-06-07
GB2500532B (en) 2018-02-21
DE112011104005T5 (en) 2013-08-29
US9325344B2 (en) 2016-04-26

Similar Documents

Publication Publication Date Title
GB201311399D0 (en) Method and data processing system for encoding data stored in a column-oriented manner, data processing program and computer program product
WO2013001535A3 (en) System, method and data structure for fast loading, storing and access to huge data sets in real time
WO2013090233A8 (en) Distributed computing in a distributed storage and task network
IN2014DN09960A (en)
Maruyama et al. ESP-index: A compressed index based on edit-sensitive parsing
GB2515938A (en) A multi-layer system for symbol-space based compression of patterns
IN2013CH04496A (en)
MX2021006632A (en) Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device.
WO2013016679A3 (en) Systems and methods for generating and using a digital pass
GB2540298A (en) Signature retrieval and matching for media monitoring
WO2011077300A3 (en) Processing of geological data
WO2014043366A3 (en) Optimal data representation and auxiliary structures for in-memory database query processing
WO2014144833A3 (en) Taste profile attributes
PH12019500795A1 (en) Method and system for the transmission of bioinformatics data
WO2014188290A3 (en) Fast and secure retrieval of dna sequences
EP3242227A4 (en) Page querying method and data processing node in oltp cluster database
IL241640B (en) Method for executing queries on streaming data using graphic processing units
WO2015044442A3 (en) Method for generating a sequence of binary code words of a multi-bit code for a control signal for a consumer
MY182481A (en) Transmitter and shortening method thereof
GB2562352A (en) Post-decoding error check with diagnostics for product codes
WO2013134662A3 (en) Systems and methods for creating a temporal content profile
IL230741B (en) Systems and methods for keyword spotting using alternating search algorithms
Wei et al. Robust forecast combinations
Couceiro et al. Decompositions of functions based on arity gap
WO2011133302A3 (en) Multi-threaded sort of data items in spreadsheet tables

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20180306