CN109815295A - Distributed type assemblies data lead-in method and device - Google Patents
Distributed type assemblies data lead-in method and device Download PDFInfo
- Publication number
- CN109815295A CN109815295A CN201910119281.5A CN201910119281A CN109815295A CN 109815295 A CN109815295 A CN 109815295A CN 201910119281 A CN201910119281 A CN 201910119281A CN 109815295 A CN109815295 A CN 109815295A
- Authority
- CN
- China
- Prior art keywords
- data
- load
- file
- foreigntablescan
- operator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000000712 assembly Effects 0.000 title claims abstract description 11
- 238000000429 assembly Methods 0.000 title claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 45
- 238000004590 computer program Methods 0.000 claims description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000002224 dissection Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of distributed type assemblies data lead-in method and devices, the described method includes: back end receives the data load command that Master node issues, start ForeignTableScan operator load document and loads process, pre-set external table is based on by ForeignTableScan operator, the data and external file relevant information to be requested are sent to file load process, wherein, in file load process setting third party ETL server;File loads the information sequence that process is sent according to back end and reads data file, and sends the data to back end;After the ForeignTableScan operator of back end collects data, local is stored data into.
Description
Technical field
The present invention relates to computer field more particularly to a kind of distributed type assemblies data lead-in methods and device.
Background technique
Distributed type assemblies database is mainly characterized by the quick response of the quick storage and complex query of mass data.Therefore
The quick storage of data is of great significance to distributed data base.Distributed data base KingbaseAnalyticsDB passes through
Data and processing work are assigned to the mode of multiple servers or host, store and process a large amount of data.
KingbaseAnalyticsDB is based on multiple single machine databases, they cooperate, and is presented to the user the effect of a database
Fruit.Fig. 1 describes the component for constituting KingbaseAnalyticsDB Database Systems: Master node is
The entrance of KingbaseAnalyticsDB Database Systems.It is client connection and the database instance for submitting SQL statement
Node.Master can coordinate the work of other database instance nodes in oneself and system, these database instances are known as counting
According to node (Segment node), for storing and processing real data.KingbaseAnalyticsDB database Segment is real
Example is independent database, and each Segment node can store the data of a part and execute most of query processing.
When a user is connected to database, and an inquiry is initiated by Master node, each Segment node can be created
Some processes are built to handle this inquiry work.User-defined table and corresponding index are all distributed in each in Database Systems
On a available Segment node, each Segment stores a part of different data.User exists
It is interacted by Master node with these Segment nodes in KingbaseAnalyticsDB Database Systems.Wherein Master
Node is also referred to as management node, and Segment node is also referred to as back end or calculate node.
Copy order is loaded into the data of file in file system in database.Copy order is first in Master node
The data in data file are parsed line by line, and are combined into a tuple according to the format of data store internal, according to the distribution key meter of table
The back end to be issued is calculated, the data is finally stored by the back end.
This scheme is to load the conventional method of external data, also be can be used in distributed data base.But there is it existing
Real disadvantage:
1.Copy order needs Master node elder generation dissection process data, calculates which data are sent to according to table distribution mode
A back end.Copy is serial process each row of data, cannot make full use of the resource of back end, and each back end is idle
Time is more, keeps loading performance relatively low.
2.Master node easily becomes bottleneck.Master node is the entrance of distributed data base, and all inquiries are all
Master node can be passed through.The storage of mass data, which individually connects execution Copy order, can occupy biggish hardware resource,
When concurrent relatively high, Master node can become the bottleneck of distributed data base.The performance of data load is not only influenced,
It will increase the response time of other type SQL.
The data file that 3.Copy order is read can only be on the host of Master node.User is needed using Copy order
Data file is first uploaded to Master node host, increases Master node host storage load simultaneously, ease for use can compare
Difference.
Summary of the invention
The embodiment of the present invention provides a kind of distributed type assemblies data lead-in method and device, divides in the prior art to solve
The slow problem of cloth data base cluster system data loading.
The embodiment of the present invention provides a kind of distributed type assemblies data lead-in method, comprising:
Back end receives the data load command that Master node issues, starting ForeignTableScan operator load
File loads process, is based on pre-set external table by ForeignTableScan operator, the data to be requested and outside
Portion's file-related information is sent to file load process, wherein file loads in process setting third party ETL server;
File loads the information sequence that process is sent according to back end and reads data file, and sends the data to data
Node;
After the ForeignTableScan operator of back end collects data, local is stored data into.
Preferably, the external table preserves the relevant information of load process, specifically includes: load process port numbers, IP
Address and to load external file list.
Preferably, starting ForeignTableScan operator load document load process specifically includes:
Start ForeignTableScan operator;
ForeignTableScan operator connects data load document and loads process according to itself node ID, poll.
Preferably, it after the ForeignTableScan operator of back end collects data, stores data into local specific
Include:
The data that the ForeignTableScan operator of back end returns to file load process explain, and are converted to
Internal tuple, storage is arrived locally, and carries out next step SQL operation.
The embodiment of the present invention also provides a kind of distributed type assemblies data importing device, comprising: memory, processor and storage
On the memory and the computer program that can run on the processor, the computer program are held by the processor
The step of above method is realized when row.
Using the embodiment of the present invention, by the quick storage of mass data, data will bypass Master node, be directly inserted into
In Segment node, it can make all Segment nodes that can receive processing data simultaneously, make full use of back end firmly
Part resource, while the concurrency of file load process can also be increased, the extensive speed for promoting data loading.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the component diagram for constituting KingbaseAnalyticsDB Database Systems in the prior art;
Fig. 2 is the integrally-built schematic diagram that distributed data base external data loads in the embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of distributed type assemblies data lead-in method and devices, realize user's mass data
Quickly storage.Using in colonization process, back end resource is sufficiently used using file load process, data flow bypasses
Master is loaded directly into back end.
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
The overall structure of distributed data base external data load is as shown in Fig. 2, the embodiment of the present invention needs to provide two
Part:
1. file loads process: it is a HTTP service process that file, which loads process,.Each calculate node (data section
Point) it is used as a HTTP client.Data load command is handed down to calculate node by Master node, and calculate node, which receives, to be added
After carrying order, the data and external file relevant information to be requested are sent to file load process, load process receives HTTP
After request, the information sequence sent according to client reads data file, and sends the data to calculate node.Calculate node is received
After, remaining process is similar with the execution Copy order of Master node for access, only stores the data to local, and no longer issues
To other nodes.
2. providing external table mechanism in cluster.External table has recorded the relevant information of load process, comprising: load process end
Slogan, IP address will load the information such as external file list.Cluster uses external table mechanism, can start in calculate node
ForeignTableScan operator, ForeignTableScan parallel threaded file load process concurrently load data.
The external table access module of file loader program (process) and distributed data base is illustrated below.
File loader program:
File load process is stored in third party's ETL server, and the execution process of file loader program is as follows:
1) externally start specific network port service
2) calculate node of cluster connects network service
3) PC cluster node, which is sent, reads instruction
4) file load process send the data of certain data volume to calculate node (size can be set, such as: 4MB)
External table access technique:
Group system realizes ForeignTableScan operator, in optimizer, if the external data table of access, makes
It is accessed with this operator to data.
The calculation process of ForeignTableScan operator is as follows.
1) in ForeignTableScan, according to the node ID of itself, poll connects data and loads service processes
2) it sends and reads data command
3) instruction of return is explained, is converted to internal tuple, carry out next step SQL operation
In conclusion the embodiment of the present invention loads process by third party's file, make data flow without Master node
It is directly entered in back end, takes full advantage of back end resource, greatly improve data loading.Meanwhile increasing
When multiple file load processes raising data load concurrent, Master node load can't be made excessively high, become entire database
The bottleneck of system.External data file can be placed on third party's load machine, also reduce storage pressure.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (5)
1. a kind of distributed type assemblies data lead-in method characterized by comprising
Back end receives the data load command that Master node issues, and starts ForeignTableScan operator load document
Load process is based on pre-set external table by ForeignTableScan operator, the data to be requested and external text
Part relevant information is sent to file load process, wherein file loads in process setting third party ETL server;
File loads the information sequence that process is sent according to back end and reads data file, and sends the data to data section
Point;
After the ForeignTableScan operator of back end collects data, local is stored data into.
2. the method as described in claim 1, which is characterized in that the external table preserves the relevant information of load process, tool
Body includes: load process port numbers, IP address and to load external file list.
3. the method as described in claim 1, which is characterized in that starting ForeignTableScan operator load document load into
Journey specifically includes:
Start ForeignTableScan operator;
ForeignTableScan operator connects data load document and loads process according to itself node ID, poll.
4. the method as described in claim 1, which is characterized in that the ForeignTableScan operator of back end collects data
Afterwards, local specifically include is stored data into:
The data that the ForeignTableScan operator of back end returns to file load process explain, and are converted to inside
Tuple, storage is arrived locally, and carries out next step SQL operation.
5. a kind of distributed type assemblies data importing device characterized by comprising memory, processor and be stored in described deposit
On reservoir and the computer program that can run on the processor, the computer program are realized when being executed by the processor
Step according to any one of claims 1 to 4.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910119281.5A CN109815295A (en) | 2019-02-18 | 2019-02-18 | Distributed type assemblies data lead-in method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910119281.5A CN109815295A (en) | 2019-02-18 | 2019-02-18 | Distributed type assemblies data lead-in method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109815295A true CN109815295A (en) | 2019-05-28 |
Family
ID=66606853
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910119281.5A Pending CN109815295A (en) | 2019-02-18 | 2019-02-18 | Distributed type assemblies data lead-in method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109815295A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113342885A (en) * | 2021-06-15 | 2021-09-03 | 深圳前海微众银行股份有限公司 | Data import method, device, equipment and computer program product |
| CN118820355A (en) * | 2024-04-22 | 2024-10-22 | 中国移动通信集团设计院有限公司 | Method, device, medium and product for loading external data of distributed database |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106790489A (en) * | 2016-12-14 | 2017-05-31 | 成都华为技术有限公司 | Parallel data loading method and system |
| US9898469B1 (en) * | 2014-02-28 | 2018-02-20 | Pivotal Software, Inc. | Parallel streaming of external data |
| CN107885780A (en) * | 2017-10-12 | 2018-04-06 | 北京人大金仓信息技术股份有限公司 | A kind of performance data collection method performed for distributed query |
| CN107885460A (en) * | 2017-10-12 | 2018-04-06 | 北京人大金仓信息技术股份有限公司 | A kind of data access method of cluster |
-
2019
- 2019-02-18 CN CN201910119281.5A patent/CN109815295A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9898469B1 (en) * | 2014-02-28 | 2018-02-20 | Pivotal Software, Inc. | Parallel streaming of external data |
| CN106790489A (en) * | 2016-12-14 | 2017-05-31 | 成都华为技术有限公司 | Parallel data loading method and system |
| CN107885780A (en) * | 2017-10-12 | 2018-04-06 | 北京人大金仓信息技术股份有限公司 | A kind of performance data collection method performed for distributed query |
| CN107885460A (en) * | 2017-10-12 | 2018-04-06 | 北京人大金仓信息技术股份有限公司 | A kind of data access method of cluster |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113342885A (en) * | 2021-06-15 | 2021-09-03 | 深圳前海微众银行股份有限公司 | Data import method, device, equipment and computer program product |
| CN118820355A (en) * | 2024-04-22 | 2024-10-22 | 中国移动通信集团设计院有限公司 | Method, device, medium and product for loading external data of distributed database |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12105703B1 (en) | System and method for interacting with a plurality of data sources | |
| US9721116B2 (en) | Test sandbox in production systems during productive use | |
| CN102375837B (en) | Data acquiring system and method | |
| US12229119B2 (en) | Multiple index scans | |
| US7966349B2 (en) | Moving records between partitions | |
| US20080189252A1 (en) | Hardware accelerated reconfigurable processor for accelerating database operations and queries | |
| US20100030995A1 (en) | Method and apparatus for applying database partitioning in a multi-tenancy scenario | |
| WO2015030767A1 (en) | Queries involving multiple databases and execution engines | |
| CN103455526A (en) | ETL (extract-transform-load) data processing method, device and system | |
| CN107783985A (en) | A kind of distributed networks database query method, apparatus and management system | |
| CN105740264A (en) | Distributed XML database sorting method and apparatus | |
| US9613129B2 (en) | Localized data affinity system and hybrid method | |
| US9672231B2 (en) | Concurrent access for hierarchical data storage | |
| US20140095508A1 (en) | Efficient selection of queries matching a record using a cache | |
| CN102737061B (en) | Distributed ticket query management system and method | |
| CN113157692A (en) | Relational memory database system | |
| EP1808779B1 (en) | Bundling database | |
| CN113868267B (en) | Method for injecting time sequence data, method for inquiring time sequence data and database system | |
| CN109815295A (en) | Distributed type assemblies data lead-in method and device | |
| CN111488323B (en) | Data processing method and device and electronic equipment | |
| US9129037B2 (en) | Disappearing index for more efficient processing of a database query | |
| CN107783728A (en) | Date storage method, device and equipment | |
| JP5464017B2 (en) | Distributed memory database system, database server, data processing method and program thereof | |
| US7392359B2 (en) | Non-blocking distinct grouping of database entries with overflow | |
| CN116483892A (en) | Serial number generation method, device, computer equipment and readable storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190528 |
|
| RJ01 | Rejection of invention patent application after publication |