[go: up one dir, main page]

US11520826B2 - Data extraction using a distributed indexing architecture for databases - Google Patents

Data extraction using a distributed indexing architecture for databases Download PDF

Info

Publication number
US11520826B2
US11520826B2 US16/280,252 US201916280252A US11520826B2 US 11520826 B2 US11520826 B2 US 11520826B2 US 201916280252 A US201916280252 A US 201916280252A US 11520826 B2 US11520826 B2 US 11520826B2
Authority
US
United States
Prior art keywords
index
data
key
tables
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/280,252
Other versions
US20200265087A1 (en
Inventor
Sandeep Verma
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of America Corp
Original Assignee
Bank of America Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of America Corp filed Critical Bank of America Corp
Priority to US16/280,252 priority Critical patent/US11520826B2/en
Assigned to BANK OF AMERICA CORPORATION reassignment BANK OF AMERICA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VERMA, SANDEEP
Publication of US20200265087A1 publication Critical patent/US20200265087A1/en
Application granted granted Critical
Publication of US11520826B2 publication Critical patent/US11520826B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results

Definitions

  • the present disclosure relates generally to databases, and more specifically to a distributed indexing architecture for databases.
  • data may be scattered across multiple machines and/or locations. These systems are designed to support large data sets which may contain thousands or millions of records.
  • One of the technical challenges for such a big data system is associated with creating an indexing system that can also be distributed across multiple machines. Implementing a conventional index system in each machine is not a viable option because these indexing systems are typically large data structures which consume a lot of memory resources. As the index system grows, every instance of the index system would need to be simultaneously managed and updated. This process involves constant communications with other devices and frequent updates which consumes both bandwidth and processing resources.
  • data may be scattered across multiple machines and/or locations. These systems are designed to support large data sets which may contain thousands or millions of records.
  • One of the technical challenges for such a big data system is associated with creating an indexing system that can also be distributed across multiple machines. Implementing a conventional index system in each machine is not a viable option because these indexing systems are typically large data structures which consume a lot of memory resources. As the index system grows, every instance of the index system would need to be simultaneously managed and updated. This process involves constant communications with other devices and frequent updates which consumes both bandwidth and processing resources.
  • the system described in the present application provides a technical solution to the technical problems discussed above by employing a distributed indexing architecture for a database.
  • the disclosed system provides several advantages which include 1) providing an architecture that allows index tables to be partitioned and distributed among multiple devices and 2) enabling the ability to perform parallel searches of index tables which reduces search times.
  • the database system provides a distributed indexing architecture that can be used to distribute index tables among multiple devices.
  • the system is configured to store data and to identify an index key and data location information for the stored data.
  • the system determines a set of index table references based on the index key.
  • Each index table reference identifies an index table where the index key and data location information may be stored.
  • the system then stores the index key and data location information in one or more of the index tables identified by the set of index table references.
  • These index tables may be distributed among and located in one or more devices.
  • each index table may use consume less memory resources since they can be partitioned and distributed among multiple devices.
  • the database system is further configured to receive a data request for data that comprises an index key that is linked with the data.
  • the system determines a set of index table references based on the index key.
  • the system searches the index tables identified by the set of index table references to determine which index table contains the index key.
  • the system may search the index tables in parallel or simultaneously to determine which index table contains the index key. Parallel searching reduces the amount of time required to search for the index key.
  • FIG. 1 is a schematic diagram of an embodiment of a system configured to implement a distributed indexing architecture for a database
  • FIG. 2 is an embodiment of a flowchart of an index key storing method
  • FIG. 3 is an illustrated example of the index storing method
  • FIG. 4 is an embodiment of a flowchart of a data retrieving method
  • FIG. 5 is an illustrated example of the data retrieving method
  • FIG. 6 is an embodiment of a device configured to implement a distributed indexing architecture for a database.
  • the system described in the present application provides a technical solution to the technical problems discussed above by employing a distributed indexing architecture for a database.
  • the disclosed system provides several advantages which include 1) providing an architecture that allows index tables to be partitioned and distributed among multiple devices and 2) enabling the ability to perform parallel searches of index tables which reduces search times.
  • FIG. 1 is an example of a system configured to implement a distributed indexing architecture for a database.
  • FIGS. 2 and 3 combine to provide an example of a process for storing data and an index key for the data using the distributed indexing architecture.
  • FIGS. 4 and 5 combine to provide an example of a process for retrieving data and an index key for the data using the distributed indexing architecture.
  • FIG. 6 is an example of a device configured to implement a distributed indexing architecture for a database.
  • FIG. 1 is a schematic diagram of an embodiment of a database system 100 configured to implement a distributed indexing architecture.
  • the database system 100 provides an indexing architecture that can be used to distribute index tables among multiple devices (e.g. network devices 102 ).
  • the system 100 is configured to store data 101 in a data structure (e.g. data table 104 ) and to identify an index key and data location information for the stored data 101 .
  • the system 100 determines a set of index table references based on the index key. Each index table reference identifies an index table 106 where the index key and data location information may be stored.
  • the system 100 then stores the index key and data location information in one or more of the index tables 104 identified by the set of index table references.
  • the index tables 106 may be distributed among and/or located in one or more network devices 102 .
  • each index table 104 may use consume less memory resources since they can be partitioned and distributed among multiple devices. Additional information about storing data 101 using distributed index tables 104 is described in FIGS. 2 and 3 .
  • the database system 100 is further configured to receive a data request 108 for data 101 that comprises an index key that is linked with the data 101 .
  • the system determines a set of index table references based on the index key.
  • the system 100 searches the index tables 104 identified by the set of index table references to determine which index table 104 contains the index key.
  • the system 100 may search the index tables 104 in parallel or simultaneously to determine which index table 104 contains the index key. Parallel searching reduces the amount of time required to search for the index key.
  • the system 100 extracts the data location information for the data 101 that is stored with the index key.
  • the system 100 can then retrieve the data 101 from a data table 106 based on the data location information. Additional information about retrieving data 101 using distributed index tables 104 is described in FIGS. 4 and 5 .
  • the database system 100 comprises one or more network devices 102 in signal communication with each other in a network 110 .
  • a network device 102 include, but are not limited to, computers, databases, web servers, or any other suitable type of network device.
  • One or more of the network devices 102 may be in signal communication with other devices (e.g. user devices 112 ).
  • a network device 102 may be configured to receive a data request 108 that comprises an index key for data 101 from a user device 112 and to send the requested data 101 to the user device 112 .
  • user devices 112 include, but are not limited to, computers, mobile devices (e.g. smart phones or tablets), Internet-of-things (IoT) devices, web clients, web servers, or any other suitable type of device.
  • IoT Internet-of-things
  • the network 110 is any suitable type of wireless and/or wired network including, but not limited to, all or a portion of the Internet, an Intranet, a private network, a public network, a peer-to-peer network, the public switched telephone network, a cellular network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a satellite network.
  • the network 110 may be configured to support any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art upon viewing this disclosure.
  • FIG. 2 is an embodiment of a flowchart of an index key storing method 200 .
  • Method 200 may be implemented by an indexing engine 608 in a device (e.g. network device 102 ) for storing data 101 and index keys linked with the data 101 using a distributed index table architecture.
  • a device e.g. network device 102
  • the indexing engine 608 stores data 101 in a data table 104 .
  • the indexing engine 608 may receive the data 101 from another device (e.g. a network device 102 or user device 112 ) and store the data 101 in response to receiving the data 101 .
  • the indexing engine 608 may be implemented on the device that generates or provides the data 101 to the indexing engine 608 . Referring to FIG. 3 as an example, the indexing engine 608 stores data 101 (shown as ‘Data 6’) in a data table 104 .
  • the data 101 is stored at a particular data location (shown as ‘row 6’) and is linked with an index key 304 (shown as ‘abcd’).
  • the indexing engine 608 stores the data 101 in the data table 104 by appending the data 101 to the end of the data table 104 without sorting the data table 104 . In other words, the indexing engine 608 adds the data 101 to the bottom of the data table 104 without reorganizing the data 101 the data table 104 .
  • the indexing engine 608 determines an index key 304 and data location information 306 for the data 101 .
  • the index key 304 is an alphanumeric or numeric identifier that is uniquely linked with data 101 .
  • the index key 304 may comprise a string of characters.
  • the data location information 306 may be any suitable information that indicates the location of where the data 101 is stored.
  • the data location information 306 may comprise a device identifier (e.g. a device name, an MAC address, or an IP address), a file name, a data table 104 , and/or a location (e.g. a row or column) within the data table 104 .
  • the indexing engine 608 determines that the data 101 is stored at ‘row 6’ in the data table 104 and is associated with an index key 304 value of ‘abcd.’
  • the indexing engine 608 determines a set of index table references 307 based on the index key 304 .
  • Each index table reference 307 corresponds with and identifies an index table 106 .
  • An index table 106 is a data structure that links index keys 304 with data location information 306 .
  • the index table 106 is a table data structure.
  • the index table 106 may be a file, a document, or any other suitable type of data structure.
  • Each index table reference 307 may be an identifier that comprises one or more characters.
  • an index table reference 307 comprises fewer character than the number of characters in the index key 304 .
  • the indexing engine 608 determines an index table reference 307 by determining a mask size 309 and extracting a set of characters equal to the mask size 309 from the index key 304 starting from the beginning of the index key 304 .
  • the indexing engine 608 may also replace one or more characters from the set of extracted characters with a wildcard character (e.g. ‘*’).
  • a wildcard character is a placeholder character that can represent any character.
  • the mask size 309 in this example is two.
  • the indexing engine 608 extracts the first two characters (i.e. ‘a’ and ‘b’) from the index key ‘abcd.’
  • the indexing engine 608 then generates different combinations of the extracted characters by replacing one or more of the extracted characters with a wildcard character (e.g. ‘*’) to generate index table references 307 .
  • the indexing engine 608 generates a set of index table references 307 that comprises ‘**’ (shown as ‘*’), ‘a*’, ‘*b’, and ‘ab’.
  • the mask size may be set to any other suitable value.
  • the indexing engine 608 identifies a set of index tables 106 corresponding with the set of index table references 307 .
  • the indexing engine 608 identifies index tables 106 A, 106 B, 106 C, and 106 D that each correspond with an index table reference 307 from the set of index table references 307 .
  • the index tables 106 A- 106 D may be stored on one or more network devices 102 .
  • identifying the set of index tables 106 comprises identifying network devices 102 where one or more index tables 106 are being stored.
  • index tables 106 A and 106 B may be stored in a first network device 102 and index tables 106 C and 106 D may be stored in a second network device 102 .
  • all of the identified index tables 106 A- 106 D may be stored in a single network device 102 .
  • the indexing engine 608 stores the index key 304 and the data location information 306 in one or more of the identified index tables 106 .
  • the indexing engine 608 stores the index key 304 and the data location information 306 by identifying an index table 106 with the fewest number of entries and storing the index key 304 and the data location information 306 in the identified index table 106 .
  • the indexing engine 106 performs load balancing by storing index keys 304 and data location information 106 in index tables 106 with more storage capacity.
  • the indexing engine 608 stores the index key 304 and the data location information 306 by sending the index key 304 and the data location information 306 to another network device 104 that contains the index table 106 .
  • the indexing engine 608 may store the index key 304 and the data location information 306 in more than one index table 106 .
  • the indexing engine 608 creates duplicate entries in multiple index tables 106 which provides redundancy and allows for reduced search times since the index key 304 and data location information 306 can be retrieved from multiple sources.
  • the indexing engine 608 stores the index key 304 and the data location information 306 in index table 106 B.
  • FIG. 4 is an embodiment of a flowchart of a data retrieving method 400 .
  • Method 400 may be implemented by an indexing engine 608 in a device (e.g. network device 102 ) for retrieving data 101 and index keys 304 linked with the data 101 using a distributed index table architecture.
  • a device e.g. network device 102
  • index keys 304 linked with the data 101 using a distributed index table architecture.
  • the indexing engine 608 receives an index key 304 for data 101 .
  • the indexing engine 608 may receive a data request 108 from a network device 102 or a user device 112 .
  • the data request 108 comprises an index key 304 for data 101 .
  • the indexing engine 608 may receive an index key 304 with a value of ‘abcd.’
  • the indexing engine 608 determines a set of index table references 307 .
  • the indexing engine 608 may determine the set of index table references 307 using a process similar to the process described in step 206 of FIG. 2 . Continuing with the example from FIG. 5 , the indexing engine 608 determines a set of index table references 307 that comprises ‘*’, ‘a*’, ‘*b’, and ‘ab’.
  • the indexing engine 608 identifies a set of index tables 106 corresponding with the set of index table references 307 .
  • the indexing engine 608 may identify the set of index tables 106 using a process similar to the process described in step 208 of FIG. 2 .
  • the indexing engine 608 searches the set of index tables 106 using the index key 304 to identify an index table 106 that contains the index key 304 .
  • the indexing engine 608 may use the index key 304 as a token for searching the set of index tables 106 .
  • the indexing engine 608 may compare the received index key 304 to the index keys 304 stored in each index table 608 to determine which index table 106 contains the index key 304 .
  • the indexing engine 608 searches the set of index table 106 in parallel or at the same time. In this case, the indexing engine 608 may use multiple processors or devices to simultaneous search the set of index tables 106 .
  • Searching multiple index tables 106 in parallel reduces the amount of time required to identify an index table 106 that contains the index key 304 .
  • the indexing engine 608 searches index tables 106 A- 106 D to determine which index table 106 contains the index key 304 .
  • index table 106 B contains the index key 304 .
  • the indexing engine 608 identifies data location information 306 based on the search results.
  • the index key 304 is linked with the data location information 306 for the data 101 .
  • the data location information 306 may identify a device identifier (e.g. a device name, an MAC address, or an IP address), a file name, a data table 104 , and/or a location (e.g. a row or column) within the data table 104 .
  • the index key 304 is linked with a data location information 306 corresponding with ‘row 6’ in the data table 104 .
  • the indexing engine 608 retrieves the data 101 from the data table 104 based on the data location information 306 .
  • the indexing engine 608 retrieves the data 101 stored in the data table 101 and the data location identified by the data location information 306 .
  • the indexing engine 608 retrieves the data 101 (shown as ‘Data 6’) from the data table 101 at the data location (i.e. row 6) identified by the data location information 306 .
  • the indexing engine 608 outputs the retrieved data 101 .
  • the indexing engine 608 may send the data 101 to the device that originally requested the data 101 .
  • FIG. 6 is an embodiment of a device 600 (e.g. network device 102 ) configured to implement a distributed indexing architecture for a database.
  • the device 600 comprises a processor 602 , a memory 604 , and a network interface 606 .
  • the device 600 may be configured as shown or in any other suitable configuration.
  • the processor 602 comprises one or more processors operably coupled to the memory 604 .
  • the processor 602 is any electronic circuitry including, but not limited to, state machines, one or more central processing unit (CPU) chips, logic units, cores (e.g. a multi-core processor), field-programmable gate array (FPGAs), application specific integrated circuits (ASICs), or digital signal processors (DSPs).
  • the processor 602 may be a programmable logic device, a microcontroller, a microprocessor, or any suitable combination of the preceding.
  • the processor 602 is communicatively coupled to and in signal communication with the memory 604 .
  • the one or more processors are configured to process data and may be implemented in hardware or software.
  • the processor 602 may be 8-bit, 16-bit, 32-bit, 64-bit or of any other suitable architecture.
  • the processor 602 may include an arithmetic logic unit (ALU) for performing arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that fetches instructions from memory and executes them by directing the coordinated operations of the ALU, registers and other components.
  • ALU arithmetic logic unit
  • the one or more processors are configured to implement various instructions.
  • the one or more processors are configured to execute instructions to implement the indexing engine 608 .
  • processor 602 may be a special purpose computer designed to implement function disclosed herein.
  • the indexing engine 608 is implemented using logic units, FPGAs, ASICs, DSPs, or any other suitable hardware.
  • the indexing engine 608 is configured as described in FIGS. 2 - 4 .
  • the memory 604 comprises one or more disks, tape drives, or solid-state drives, and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.
  • the memory 604 may be volatile or non-volatile and may comprise read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), dynamic random-access memory (DRAM), and static random-access memory (SRAM).
  • the memory 604 is operable to store indexing instructions 610 , index tables 106 , data tables 104 , and/or any other data or instructions.
  • the indexing instructions 610 may comprise any suitable set of instructions, logic, rules, or code operable to execute the indexing engine 608 .
  • the index tables 106 and the data tables 104 are configured similar to the index tables 106 and the data tables 104 described in FIGS. 2 - 4 .
  • the network interface 606 is configured to enable wired and/or wireless communications.
  • the network interface 606 is configured to communicate data between the device 600 and other devices (e.g. network devices 102 or user devices 112 ), systems, or domain.
  • the network interface 606 may comprise a WIFI interface, a local area network (LAN) interface, a wide area network (WAN) interface, a modem, a switch, or a router.
  • the processor 602 is configured to send and receive data using the network interface 606 .
  • the network interface 606 may be configured to use any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data retrieval device that includes a memory operable to store a data table and an indexing engine implemented by a processor. The indexing engine is configured to receive an index key for data and to determine a set of index table references based on the index key. Each index table reference identifies an index table that links index keys with data location information. The indexing engine is further configured to identify a set of index tables corresponding with the set of index table references and to identify an index table from the set of index tables that contains the index key. The indexing engine is further configured to retrieve a data location information linked with the index key from the index table, to retrieve the data from the data table based on the data location information, and to output the retrieved data.

Description

TECHNICAL FIELD
The present disclosure relates generally to databases, and more specifically to a distributed indexing architecture for databases.
BACKGROUND
In conventional big data systems data may be scattered across multiple machines and/or locations. These systems are designed to support large data sets which may contain thousands or millions of records. One of the technical challenges for such a big data system is associated with creating an indexing system that can also be distributed across multiple machines. Implementing a conventional index system in each machine is not a viable option because these indexing systems are typically large data structures which consume a lot of memory resources. As the index system grows, every instance of the index system would need to be simultaneously managed and updated. This process involves constant communications with other devices and frequent updates which consumes both bandwidth and processing resources.
Thus, it is desirable to provide a technical solution that provides the ability to implement an indexing architecture that can be distributed among multiple devices.
SUMMARY
In conventional big data systems data may be scattered across multiple machines and/or locations. These systems are designed to support large data sets which may contain thousands or millions of records. One of the technical challenges for such a big data system is associated with creating an indexing system that can also be distributed across multiple machines. Implementing a conventional index system in each machine is not a viable option because these indexing systems are typically large data structures which consume a lot of memory resources. As the index system grows, every instance of the index system would need to be simultaneously managed and updated. This process involves constant communications with other devices and frequent updates which consumes both bandwidth and processing resources.
Another technical challenge for big data systems is the amount of time it takes to search a conventional indexing system. Conventional indexing systems are typically implemented using a large binary tree structure. Locating information in a binary tree structure involves performing linear searches. The amount of time required for performing a linear search increases as the depth of the binary tree increases. In other words, the amount of time required to search the binary tree increases linearly as the binary tree grows over time. This means that processing resources will be occupied for longer periods of time as the indexing system grows and the search time increases. The performance of a device implementing a conventional indexing system degrades of over time dur to the steady increase in the amount of consumed memory and processing resources.
The system described in the present application provides a technical solution to the technical problems discussed above by employing a distributed indexing architecture for a database. The disclosed system provides several advantages which include 1) providing an architecture that allows index tables to be partitioned and distributed among multiple devices and 2) enabling the ability to perform parallel searches of index tables which reduces search times.
The database system provides a distributed indexing architecture that can be used to distribute index tables among multiple devices. The system is configured to store data and to identify an index key and data location information for the stored data. The system then determines a set of index table references based on the index key. Each index table reference identifies an index table where the index key and data location information may be stored. The system then stores the index key and data location information in one or more of the index tables identified by the set of index table references. These index tables may be distributed among and located in one or more devices. In addition, each index table may use consume less memory resources since they can be partitioned and distributed among multiple devices.
The database system is further configured to receive a data request for data that comprises an index key that is linked with the data. In response to receiving the index key, the system determines a set of index table references based on the index key. The system then searches the index tables identified by the set of index table references to determine which index table contains the index key. The system may search the index tables in parallel or simultaneously to determine which index table contains the index key. Parallel searching reduces the amount of time required to search for the index key. Once the system identifies an index table that contains the index key, the system extracts the data location information for the data that is stored with the index key. The system can then retrieve the data based on the data location information.
Certain embodiments of the present disclosure may include some, all, or none of these advantages. These advantages and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
FIG. 1 is a schematic diagram of an embodiment of a system configured to implement a distributed indexing architecture for a database;
FIG. 2 is an embodiment of a flowchart of an index key storing method;
FIG. 3 is an illustrated example of the index storing method;
FIG. 4 is an embodiment of a flowchart of a data retrieving method;
FIG. 5 is an illustrated example of the data retrieving method; and
FIG. 6 is an embodiment of a device configured to implement a distributed indexing architecture for a database.
DETAILED DESCRIPTION
The system described in the present application provides a technical solution to the technical problems discussed above by employing a distributed indexing architecture for a database. The disclosed system provides several advantages which include 1) providing an architecture that allows index tables to be partitioned and distributed among multiple devices and 2) enabling the ability to perform parallel searches of index tables which reduces search times.
FIG. 1 is an example of a system configured to implement a distributed indexing architecture for a database. FIGS. 2 and 3 combine to provide an example of a process for storing data and an index key for the data using the distributed indexing architecture. FIGS. 4 and 5 combine to provide an example of a process for retrieving data and an index key for the data using the distributed indexing architecture. FIG. 6 is an example of a device configured to implement a distributed indexing architecture for a database.
FIG. 1 is a schematic diagram of an embodiment of a database system 100 configured to implement a distributed indexing architecture. The database system 100 provides an indexing architecture that can be used to distribute index tables among multiple devices (e.g. network devices 102). The system 100 is configured to store data 101 in a data structure (e.g. data table 104) and to identify an index key and data location information for the stored data 101. The system 100 then determines a set of index table references based on the index key. Each index table reference identifies an index table 106 where the index key and data location information may be stored. The system 100 then stores the index key and data location information in one or more of the index tables 104 identified by the set of index table references. The index tables 106 may be distributed among and/or located in one or more network devices 102. In addition, each index table 104 may use consume less memory resources since they can be partitioned and distributed among multiple devices. Additional information about storing data 101 using distributed index tables 104 is described in FIGS. 2 and 3 .
The database system 100 is further configured to receive a data request 108 for data 101 that comprises an index key that is linked with the data 101. In response to receiving the index key, the system determines a set of index table references based on the index key. The system 100 then searches the index tables 104 identified by the set of index table references to determine which index table 104 contains the index key. The system 100 may search the index tables 104 in parallel or simultaneously to determine which index table 104 contains the index key. Parallel searching reduces the amount of time required to search for the index key. Once the system 100 identifies an index table 104 that contains the index key, the system 100 extracts the data location information for the data 101 that is stored with the index key. The system 100 can then retrieve the data 101 from a data table 106 based on the data location information. Additional information about retrieving data 101 using distributed index tables 104 is described in FIGS. 4 and 5 .
In one embodiment, the database system 100 comprises one or more network devices 102 in signal communication with each other in a network 110. Examples of a network device 102 include, but are not limited to, computers, databases, web servers, or any other suitable type of network device. One or more of the network devices 102 may be in signal communication with other devices (e.g. user devices 112). For example, a network device 102 may be configured to receive a data request 108 that comprises an index key for data 101 from a user device 112 and to send the requested data 101 to the user device 112. Examples of user devices 112 include, but are not limited to, computers, mobile devices (e.g. smart phones or tablets), Internet-of-things (IoT) devices, web clients, web servers, or any other suitable type of device.
The network 110 is any suitable type of wireless and/or wired network including, but not limited to, all or a portion of the Internet, an Intranet, a private network, a public network, a peer-to-peer network, the public switched telephone network, a cellular network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a satellite network. The network 110 may be configured to support any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art upon viewing this disclosure.
Data Storing Process
FIG. 2 is an embodiment of a flowchart of an index key storing method 200. Method 200 may be implemented by an indexing engine 608 in a device (e.g. network device 102) for storing data 101 and index keys linked with the data 101 using a distributed index table architecture.
At step 202, the indexing engine 608 stores data 101 in a data table 104. In one embodiment, the indexing engine 608 may receive the data 101 from another device (e.g. a network device 102 or user device 112) and store the data 101 in response to receiving the data 101. In another embodiment, the indexing engine 608 may be implemented on the device that generates or provides the data 101 to the indexing engine 608. Referring to FIG. 3 as an example, the indexing engine 608 stores data 101 (shown as ‘Data 6’) in a data table 104. The data 101 is stored at a particular data location (shown as ‘row 6’) and is linked with an index key 304 (shown as ‘abcd’). In one embodiment, the indexing engine 608 stores the data 101 in the data table 104 by appending the data 101 to the end of the data table 104 without sorting the data table 104. In other words, the indexing engine 608 adds the data 101 to the bottom of the data table 104 without reorganizing the data 101 the data table 104.
Returning to FIG. 2 at step 204, the indexing engine 608 determines an index key 304 and data location information 306 for the data 101. The index key 304 is an alphanumeric or numeric identifier that is uniquely linked with data 101. For example, the index key 304 may comprise a string of characters. The data location information 306 may be any suitable information that indicates the location of where the data 101 is stored. For example, the data location information 306 may comprise a device identifier (e.g. a device name, an MAC address, or an IP address), a file name, a data table 104, and/or a location (e.g. a row or column) within the data table 104. Continuing with the example from FIG. 3 , the indexing engine 608 determines that the data 101 is stored at ‘row 6’ in the data table 104 and is associated with an index key 304 value of ‘abcd.’
Returning to FIG. 2 at step 206, the indexing engine 608 determines a set of index table references 307 based on the index key 304. Each index table reference 307 corresponds with and identifies an index table 106. An index table 106 is a data structure that links index keys 304 with data location information 306. In one embodiment, the index table 106 is a table data structure. In other embodiments, the index table 106 may be a file, a document, or any other suitable type of data structure.
Each index table reference 307 may be an identifier that comprises one or more characters. In one embodiment, an index table reference 307 comprises fewer character than the number of characters in the index key 304. In one embodiment, the indexing engine 608 determines an index table reference 307 by determining a mask size 309 and extracting a set of characters equal to the mask size 309 from the index key 304 starting from the beginning of the index key 304. The indexing engine 608 may also replace one or more characters from the set of extracted characters with a wildcard character (e.g. ‘*’). A wildcard character is a placeholder character that can represent any character. Continuing with the example from FIG. 3 , the mask size 309 in this example is two. Here, the indexing engine 608 extracts the first two characters (i.e. ‘a’ and ‘b’) from the index key ‘abcd.’ The indexing engine 608 then generates different combinations of the extracted characters by replacing one or more of the extracted characters with a wildcard character (e.g. ‘*’) to generate index table references 307. In this example, the indexing engine 608 generates a set of index table references 307 that comprises ‘**’ (shown as ‘*’), ‘a*’, ‘*b’, and ‘ab’. In other examples, the mask size may be set to any other suitable value.
Returning to FIG. 2 at step 208, the indexing engine 608 identifies a set of index tables 106 corresponding with the set of index table references 307. Continuing with the example from FIG. 3 , the indexing engine 608 identifies index tables 106A, 106B, 106C, and 106D that each correspond with an index table reference 307 from the set of index table references 307. The index tables 106A-106D may be stored on one or more network devices 102. In this example, identifying the set of index tables 106 comprises identifying network devices 102 where one or more index tables 106 are being stored. For instance, index tables 106A and 106B may be stored in a first network device 102 and index tables 106C and 106D may be stored in a second network device 102. In other examples, all of the identified index tables 106A-106D may be stored in a single network device 102.
Returning to FIG. 2 at step 210, the indexing engine 608 stores the index key 304 and the data location information 306 in one or more of the identified index tables 106. In one embodiment, the indexing engine 608 stores the index key 304 and the data location information 306 by identifying an index table 106 with the fewest number of entries and storing the index key 304 and the data location information 306 in the identified index table 106. In this example, the indexing engine 106 performs load balancing by storing index keys 304 and data location information 106 in index tables 106 with more storage capacity. In one embodiment, the indexing engine 608 stores the index key 304 and the data location information 306 by sending the index key 304 and the data location information 306 to another network device 104 that contains the index table 106. In some embodiments, the indexing engine 608 may store the index key 304 and the data location information 306 in more than one index table 106. In this case, the indexing engine 608 creates duplicate entries in multiple index tables 106 which provides redundancy and allows for reduced search times since the index key 304 and data location information 306 can be retrieved from multiple sources. Continuing with the example from FIG. 3 , the indexing engine 608 stores the index key 304 and the data location information 306 in index table 106B.
Data Retrieval Process
FIG. 4 is an embodiment of a flowchart of a data retrieving method 400. Method 400 may be implemented by an indexing engine 608 in a device (e.g. network device 102) for retrieving data 101 and index keys 304 linked with the data 101 using a distributed index table architecture.
At step 402, the indexing engine 608 receives an index key 304 for data 101. For example, the indexing engine 608 may receive a data request 108 from a network device 102 or a user device 112. The data request 108 comprises an index key 304 for data 101. Referring to FIG. 5 as an example, the indexing engine 608 may receive an index key 304 with a value of ‘abcd.’
Returning to FIG. 4 at step 404, the indexing engine 608 determines a set of index table references 307. The indexing engine 608 may determine the set of index table references 307 using a process similar to the process described in step 206 of FIG. 2 . Continuing with the example from FIG. 5 , the indexing engine 608 determines a set of index table references 307 that comprises ‘*’, ‘a*’, ‘*b’, and ‘ab’.
Returning to FIG. 4 at step 406, the indexing engine 608 identifies a set of index tables 106 corresponding with the set of index table references 307. The indexing engine 608 may identify the set of index tables 106 using a process similar to the process described in step 208 of FIG. 2 .
At step 408, the indexing engine 608 searches the set of index tables 106 using the index key 304 to identify an index table 106 that contains the index key 304. The indexing engine 608 may use the index key 304 as a token for searching the set of index tables 106. For example, the indexing engine 608 may compare the received index key 304 to the index keys 304 stored in each index table 608 to determine which index table 106 contains the index key 304. In one embodiment, the indexing engine 608 searches the set of index table 106 in parallel or at the same time. In this case, the indexing engine 608 may use multiple processors or devices to simultaneous search the set of index tables 106. Searching multiple index tables 106 in parallel reduces the amount of time required to identify an index table 106 that contains the index key 304. Continuing with the example from FIG. 5 , the indexing engine 608 searches index tables 106A-106D to determine which index table 106 contains the index key 304. In this example, index table 106B contains the index key 304.
Returning to FIG. 4 at step 410, the indexing engine 608 identifies data location information 306 based on the search results. In the identified index table 106, the index key 304 is linked with the data location information 306 for the data 101. The data location information 306 may identify a device identifier (e.g. a device name, an MAC address, or an IP address), a file name, a data table 104, and/or a location (e.g. a row or column) within the data table 104. Continuing with the example from FIG. 5 , the index key 304 is linked with a data location information 306 corresponding with ‘row 6’ in the data table 104.
Returning to FIG. 4 at step 412, the indexing engine 608 retrieves the data 101 from the data table 104 based on the data location information 306. Here, the indexing engine 608 retrieves the data 101 stored in the data table 101 and the data location identified by the data location information 306. Continuing with the example from FIG. 5 , the indexing engine 608 retrieves the data 101 (shown as ‘Data 6’) from the data table 101 at the data location (i.e. row 6) identified by the data location information 306.
Returning to FIG. 4 at step 414, the indexing engine 608 outputs the retrieved data 101. For example, the indexing engine 608 may send the data 101 to the device that originally requested the data 101.
Data Manipulation Detection Device
FIG. 6 is an embodiment of a device 600 (e.g. network device 102) configured to implement a distributed indexing architecture for a database. The device 600 comprises a processor 602, a memory 604, and a network interface 606. The device 600 may be configured as shown or in any other suitable configuration.
The processor 602 comprises one or more processors operably coupled to the memory 604. The processor 602 is any electronic circuitry including, but not limited to, state machines, one or more central processing unit (CPU) chips, logic units, cores (e.g. a multi-core processor), field-programmable gate array (FPGAs), application specific integrated circuits (ASICs), or digital signal processors (DSPs). The processor 602 may be a programmable logic device, a microcontroller, a microprocessor, or any suitable combination of the preceding. The processor 602 is communicatively coupled to and in signal communication with the memory 604. The one or more processors are configured to process data and may be implemented in hardware or software. For example, the processor 602 may be 8-bit, 16-bit, 32-bit, 64-bit or of any other suitable architecture. The processor 602 may include an arithmetic logic unit (ALU) for performing arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that fetches instructions from memory and executes them by directing the coordinated operations of the ALU, registers and other components.
The one or more processors are configured to implement various instructions. For example, the one or more processors are configured to execute instructions to implement the indexing engine 608. In this way, processor 602 may be a special purpose computer designed to implement function disclosed herein. In an embodiment, the indexing engine 608 is implemented using logic units, FPGAs, ASICs, DSPs, or any other suitable hardware. The indexing engine 608 is configured as described in FIGS. 2-4 .
The memory 604 comprises one or more disks, tape drives, or solid-state drives, and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. The memory 604 may be volatile or non-volatile and may comprise read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), dynamic random-access memory (DRAM), and static random-access memory (SRAM).
The memory 604 is operable to store indexing instructions 610, index tables 106, data tables 104, and/or any other data or instructions. The indexing instructions 610 may comprise any suitable set of instructions, logic, rules, or code operable to execute the indexing engine 608. The index tables 106 and the data tables 104 are configured similar to the index tables 106 and the data tables 104 described in FIGS. 2-4 .
The network interface 606 is configured to enable wired and/or wireless communications. The network interface 606 is configured to communicate data between the device 600 and other devices (e.g. network devices 102 or user devices 112), systems, or domain. For example, the network interface 606 may comprise a WIFI interface, a local area network (LAN) interface, a wide area network (WAN) interface, a modem, a switch, or a router. The processor 602 is configured to send and receive data using the network interface 606. The network interface 606 may be configured to use any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art.
While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.
To aid the Patent Office, and any readers of any patent issued on this application in interpreting the claims appended hereto, applicants note that they do not intend any of the appended claims to invoke 35 U.S.C. § 112(f) as it exists on the date of filing hereof unless the words “means for” or “step for” are explicitly used in the particular claim.

Claims (17)

The invention claimed is:
1. A data retrieval device, comprising:
a memory device operable to store:
a plurality of data tables; and
a plurality of index tables, wherein each index table comprises:
a plurality of index keys; and
each index key is associated with data location information that identifies where data is stored; and
a hardware processor operably coupled to the memory device, configured to:
receive a data request comprising an index key for data stored in a data table,
wherein the index key comprises a string of characters;
determine a set of index table references based on the index key, wherein:
each index table reference comprises a subset of characters from the index key;
at least one of the index table references comprises a subset of characters from the index key and a wildcard character that represents any character;
each index table reference identifies an index table from among the plurality of index tables; and
determining the set of index table references comprises:
determining a mask size, wherein the mask size is equal to a number of characters for each index table reference;
extracting a set of characters from the index key starting from the beginning of the index key, wherein the number of extracted characters is equal to the mask size; and
replacing a character from the set of characters with a wild card character;
identify a set of index tables from among the plurality of index tables corresponding with the set of index table references;
identify an index table from the set of index tables that contains the index key;
retrieve a data location information linked with the index key from the index table, wherein the data location information identifies a data table and a location within the data table where the data is stored;
retrieve the data from the data table based on the data location information; and
output the retrieved data.
2. The device of claim 1, wherein identifying the index table from the set of index tables that contains the index key comprises searching the set of index tables in parallel.
3. The device of claim 1, wherein each index table reference comprises fewer characters than the number of characters in the index key.
4. The device of claim 1, wherein identifying the set of index tables comprises identifying one or more network devices storing at least one index table from the set of index tables.
5. The device of claim 1, wherein identifying the index table that contains the index key comprises comparing the index key to the index keys stored the set of index tables.
6. The device of claim 1, wherein:
receiving the index key comprises receiving the index key from a user device; and
outputting the data comprises sending the data to the user device.
7. A data retrieval method, comprising:
receiving a data request comprising an index key for data, wherein the index key comprises a string of characters;
determining a set of index table references based on the index key, wherein:
each index table reference comprises a subset of characters from the index key;
at least one of the index table references comprises a subset of characters from the index key and a wildcard character that represents any character;
each index table reference identifies an index table from among a plurality of index tables, wherein each index table comprises:
a plurality of index keys; and
each index key is associated with data location information that identifies where data is stored; and
determining the set of index table references comprises:
determining a mask size, wherein the mask size is equal to a number of characters for each index table reference;
extracting a set of characters from the index key starting from the beginning of the index key, wherein the number of extracted characters is equal to the mask size; and
replacing a character from the set of characters with a wild card character;
identifying a set of index tables from among the plurality of index tables corresponding with the set of index table references;
identifying an index table from the set of index tables that contains the index key;
retrieving a data location information linked with the index key from the index table, wherein the data location information identifies a data table and a location within the data table where the data is stored;
retrieving the data from a data table based on the data location information; and
outputting the retrieved data.
8. The method of claim 7, wherein identifying the index table from the set of index tables that contains the index key comprises searching the set of index tables in parallel.
9. The method of claim 7, wherein each index table reference comprises fewer characters than the number of characters in the index key.
10. The method of claim 7, wherein identifying the set of index tables comprises identifying one or more network devices storing at least one index table from the set of index tables.
11. The method of claim 7, wherein identifying the index table that contains the index key comprises comparing the index key to the index keys stored the set of index tables.
12. The method of claim 7, wherein:
receiving the index key comprises receiving the index key from a user device; and
outputting the data comprises sending the data to the user device.
13. A computer program product comprising executable instructions stored in a non-transitory computer readable medium that when executed by a processor causes the processor to:
receive a data request comprising an index key for data, wherein the index key comprises a string of characters;
determine a set of index table references based on the index key, wherein:
each index table reference comprises a subset of characters from the index key;
at least one of the index table references comprises a subset of characters from the index key and a wildcard character that represents any character;
each index table reference identifies an index table from among a plurality of index tables, wherein each index table comprises:
a plurality of index keys; and
each index key is associated with data location information that identifies where data is stored; and
determining the set of index table references comprises:
determining a mask size, wherein the mask size is equal to a number of characters for each index table reference;
extracting a set of characters from the index key starting from the beginning of the index key, wherein the number of extracted characters is equal to the mask size; and
replacing a character from the set of characters with a wild card character;
identify a set of index tables from among the plurality of index tables corresponding with the set of index table references;
identify an index table from the set of index tables that contains the index key;
retrieve a data location information linked with the index key from the index table, wherein the data location information identifies a data table and a location within the data table where the data is stored;
retrieve the data from a data table based on the data location information; and
output the retrieved data.
14. The computer program product of claim 13, wherein identifying the index table from the set of index tables that contains the index key comprises searching the set of index tables in parallel.
15. The computer program product of claim 13, wherein identifying the set of index tables comprises identifying one or more network devices storing at least one index table from the set of index tables.
16. The computer program product of claim 13, wherein identifying the index table that contains the index key comprises comparing the index key to the index keys stored the set of index tables.
17. The computer program product of claim 13, wherein:
receiving the index key comprises receiving the index key from a user device; and
outputting the data comprises sending the data to the user device.
US16/280,252 2019-02-20 2019-02-20 Data extraction using a distributed indexing architecture for databases Active 2040-07-30 US11520826B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/280,252 US11520826B2 (en) 2019-02-20 2019-02-20 Data extraction using a distributed indexing architecture for databases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/280,252 US11520826B2 (en) 2019-02-20 2019-02-20 Data extraction using a distributed indexing architecture for databases

Publications (2)

Publication Number Publication Date
US20200265087A1 US20200265087A1 (en) 2020-08-20
US11520826B2 true US11520826B2 (en) 2022-12-06

Family

ID=72042123

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/280,252 Active 2040-07-30 US11520826B2 (en) 2019-02-20 2019-02-20 Data extraction using a distributed indexing architecture for databases

Country Status (1)

Country Link
US (1) US11520826B2 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11892996B1 (en) 2019-07-16 2024-02-06 Splunk Inc. Identifying an indexing node to process data using a resource catalog
US11829415B1 (en) 2020-01-31 2023-11-28 Splunk Inc. Mapping buckets and search peers to a bucket map identifier for searching
US12321396B1 (en) 2020-07-31 2025-06-03 Splunk Inc. Generating and storing aggregate data slices in a remote shared storage system
US11449371B1 (en) 2020-07-31 2022-09-20 Splunk Inc. Indexing data at a data intake and query system based on a node capacity threshold
US11615082B1 (en) 2020-07-31 2023-03-28 Splunk Inc. Using a data store and message queue to ingest data for a data intake and query system
US11609913B1 (en) 2020-10-16 2023-03-21 Splunk Inc. Reassigning data groups from backup to searching for a processing node
US11809395B1 (en) * 2021-07-15 2023-11-07 Splunk Inc. Load balancing, failover, and reliable delivery of data in a data intake and query system
US12373414B1 (en) 2023-01-31 2025-07-29 Splunk Inc. Reassigning data groups based on activation of a processing node
US12164402B1 (en) 2023-01-31 2024-12-10 Splunk Inc. Deactivating a processing node based on assignment of a data group assigned to the processing node

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5806065A (en) 1996-05-06 1998-09-08 Microsoft Corporation Data system with distributed tree indexes and method for maintaining the indexes
US20070198520A1 (en) * 2004-03-30 2007-08-23 Mckenney Paul E Atomic renaming and moving of data files while permitting lock-free look-ups
US20070294272A1 (en) * 2006-06-09 2007-12-20 Mark John Anderson Apparatus and Method for Autonomic Index Creation, Modification and Deletion
US20100306786A1 (en) * 2006-03-31 2010-12-02 Isilon Systems, Inc. Systems and methods for notifying listeners of events
US7937375B2 (en) * 2007-07-19 2011-05-03 Oracle International Corporation Method and apparatus for masking index values in a database
US8214400B2 (en) 2005-10-21 2012-07-03 Emc Corporation Systems and methods for maintaining distributed data
US20120191860A1 (en) * 2001-01-22 2012-07-26 Traversat Bernard A Peer-to-Peer Communication Pipes
US9715515B2 (en) 2014-01-31 2017-07-25 Microsoft Technology Licensing, Llc External data access with split index
US9852147B2 (en) 2015-04-01 2017-12-26 Dropbox, Inc. Selective synchronization and distributed content item block caching for multi-premises hosting of digital content items
US9977805B1 (en) 2017-02-13 2018-05-22 Sas Institute Inc. Distributed data set indexing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5806065A (en) 1996-05-06 1998-09-08 Microsoft Corporation Data system with distributed tree indexes and method for maintaining the indexes
US20120191860A1 (en) * 2001-01-22 2012-07-26 Traversat Bernard A Peer-to-Peer Communication Pipes
US20070198520A1 (en) * 2004-03-30 2007-08-23 Mckenney Paul E Atomic renaming and moving of data files while permitting lock-free look-ups
US8214400B2 (en) 2005-10-21 2012-07-03 Emc Corporation Systems and methods for maintaining distributed data
US20100306786A1 (en) * 2006-03-31 2010-12-02 Isilon Systems, Inc. Systems and methods for notifying listeners of events
US20070294272A1 (en) * 2006-06-09 2007-12-20 Mark John Anderson Apparatus and Method for Autonomic Index Creation, Modification and Deletion
US7937375B2 (en) * 2007-07-19 2011-05-03 Oracle International Corporation Method and apparatus for masking index values in a database
US9715515B2 (en) 2014-01-31 2017-07-25 Microsoft Technology Licensing, Llc External data access with split index
US9852147B2 (en) 2015-04-01 2017-12-26 Dropbox, Inc. Selective synchronization and distributed content item block caching for multi-premises hosting of digital content items
US9977805B1 (en) 2017-02-13 2018-05-22 Sas Institute Inc. Distributed data set indexing
US9977807B1 (en) 2017-02-13 2018-05-22 Sas Institute Inc. Distributed data set indexing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Verma, S., "Distributed Indexing Architecture for Databases," U.S. Appl. No. 16/280,219, filed Feb. 20, 2019, 27 pages.

Also Published As

Publication number Publication date
US20200265087A1 (en) 2020-08-20

Similar Documents

Publication Publication Date Title
US11520826B2 (en) Data extraction using a distributed indexing architecture for databases
US10838940B1 (en) Balanced key range based retrieval of key-value database
US11347787B2 (en) Image retrieval method and apparatus, system, server, and storage medium
US10152497B2 (en) Bulk deduplication detection
US9077669B2 (en) Efficient lookup methods for ternary content addressable memory and associated devices and systems
US10901996B2 (en) Optimized subset processing for de-duplication
CN107943952B (en) Method for realizing full-text retrieval based on Spark framework
JP6720626B2 (en) Removal of outdated items in curated content
WO2020047317A1 (en) System and method for facilitating efficient indexing in a database system
US20160103858A1 (en) Data management system comprising a trie data structure, integrated circuits and methods therefor
CN105447166A (en) Keyword based information search method and system
US20200183934A1 (en) Efficient database searching for queries using wildcards
CN114491324A (en) Information push method, device, computer equipment and storage medium
CN117874082A (en) Method for searching associated dictionary data and related components
US10205679B2 (en) Resource object resolution management
EP3107010B1 (en) Data integration pipeline
US10990574B2 (en) Distributed indexing architecture for databases
US8489551B2 (en) Method for selecting a processor for query execution
US11921690B2 (en) Custom object paths for object storage management
CN117520112A (en) Method, device, equipment and storage medium for efficiency analysis processing of computing task
US11144593B2 (en) Indexing structure with size bucket indexes
US20160019204A1 (en) Matching large sets of words
US20250103654A1 (en) Space-optimized forest for graph databases
CN117539962B (en) Data processing method, device, computer equipment and storage medium
US11119999B2 (en) Zero-overhead hash filters

Legal Events

Date Code Title Description
AS Assignment

Owner name: BANK OF AMERICA CORPORATION, NORTH CAROLINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VERMA, SANDEEP;REEL/FRAME:048382/0369

Effective date: 20190207

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE