CN110297818A - Construct the method and device of data warehouse - Google Patents
Construct the method and device of data warehouse Download PDFInfo
- Publication number
- CN110297818A CN110297818A CN201910563806.4A CN201910563806A CN110297818A CN 110297818 A CN110297818 A CN 110297818A CN 201910563806 A CN201910563806 A CN 201910563806A CN 110297818 A CN110297818 A CN 110297818A
- Authority
- CN
- China
- Prior art keywords
- theme
- source
- data
- designated key
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The disclosure provides a kind of method for constructing data warehouse, the data warehouse includes one or more themes library, the described method includes: setting theme priority allocation list, the theme priority allocation list is for configuring priority of each designated key attribute in each specified data source;According to priority of each designated key attribute in each specified data source, tracing to the source for each corresponding subject data of the designated key attribute and the subject data is determined, obtain theme and trace to the source table;Subject heading list for characterizing the theme library is generated according to theme table of tracing to the source.Therefore, the disclosure realizes the Longitudinal Extension, extending transversely and trace to the source of data warehouse, also improves the reliability of building data warehouse.
Description
Technical field
This disclosure relates to computer communication technology field more particularly to a kind of method and device for constructing data warehouse.
Background technique
Data warehouse, English name are Data Warehouse, can be abbreviated as DW or DWH.Data warehouse is for institute, enterprise
There is the decision-making process of rank, the strategy set of all types data support is provided.
In the related technology, data warehouse is subject-oriented, and the data in data warehouse are according to certain theme
Domain carries out tissue.Wherein, emphasis side of concern when theme here refers to user using data warehouse progress decision
Face.Such as: in the data warehouse of public security system, can divide the data into: people, several big themes such as thing, object, case.For people
Theme library building, following steps can be divided into: (1) first according to people the characteristics of, according to the professional knowledge of public security, establish
Play the wide table of people;(2) from existing business datum, the field for extracting and wanting in wide table is arranged;(3) when multiple traffic tables have
When having the same field, the data in that highest table of confidence level are selected.
But in the building of the data warehouse in above-mentioned subject-oriented, needs to compare the data in multiple tables, need area
Each of tables of data field is assigned to, if multiple tables are not only needed to distinguish priority, also be needed there are when the field of same meaning
The validity of every data is distinguished, realizes that process is extremely complex, while being also unfavorable for the Longitudinal Extension of data warehouse, transverse direction
It extends and traces to the source.
Summary of the invention
To overcome the problems in correlation technique, present disclose provides a kind of method and devices for constructing data warehouse.
According to the first aspect of the embodiments of the present disclosure, a kind of method constructing data warehouse, the data warehouse packet are provided
Include one or more theme libraries, which comprises
Theme priority allocation list is set, and the theme priority allocation list is for configuring each designated key attribute each
Priority in a specified data source;
According to priority of each designated key attribute in each specified data source, each specified master is determined
Topic attribute corresponding subject data and the subject data are traced to the source, and are obtained theme and are traced to the source table;
Subject heading list for characterizing the theme library is generated according to theme table of tracing to the source.According to the of the embodiment of the present disclosure
Two aspects, provide a kind of device for constructing data warehouse, and the data warehouse includes one or more themes library, described device packet
It includes:
Setup module, is configured as setting theme priority allocation list, and the theme priority allocation list is each for configuring
Priority of a designated key attribute in each specified data source;
Determining module is configured as the priority according to each designated key attribute in each specified data source,
It determines tracing to the source for each corresponding subject data of the designated key attribute and the subject data, obtains theme and trace to the source table;
Generation module, the table that is configured as being traced to the source according to the theme generate the subject heading list for characterizing the theme library.
According to the third aspect of an embodiment of the present disclosure, a kind of non-transitorycomputer readable storage medium is provided, is deposited thereon
Contain computer program, which is characterized in that the program realizes the building data that above-mentioned first aspect provides when being executed by processor
The method in warehouse.
According to a fourth aspect of embodiments of the present disclosure, a kind of device constructing data warehouse, the data warehouse packet are provided
One or more theme libraries are included, described device includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Theme priority allocation list is set, and the theme priority allocation list is for configuring each designated key attribute each
Priority in a specified data source;
According to priority of each designated key attribute in each specified data source, each specified master is determined
Topic attribute corresponding subject data and the subject data are traced to the source, and are obtained theme and are traced to the source table;
Subject heading list for characterizing the theme library is generated according to theme table of tracing to the source.
The technical scheme provided by this disclosed embodiment can include the following benefits:
The disclosure can be by being arranged theme priority allocation list, and the theme priority allocation list is each specified for configuring
Priority of the subject attribute in each specified data source;It is excellent in each specified data source according to each designated key attribute
First grade determines tracing to the source for the corresponding subject data of each designated key attribute and the subject data, obtains theme and traces to the source table;According to
Theme table of tracing to the source is generated for characterizing the subject heading list in theme library, thus be conducive to the Longitudinal Extension of data warehouse, it is extending transversely and
It traces to the source, improves the reliability of building data warehouse.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
Fig. 1 is a kind of method flow diagram for constructing data warehouse shown according to an exemplary embodiment;
Fig. 2 is the method flow diagram of another building data warehouse shown according to an exemplary embodiment;
Fig. 3 is the method flow diagram of another building data warehouse shown according to an exemplary embodiment;
Fig. 4 is a kind of device block diagram for constructing data warehouse shown according to an exemplary embodiment;
Fig. 5 is the device block diagram of another building data warehouse shown according to an exemplary embodiment;
Fig. 6 is the device block diagram of another building data warehouse shown according to an exemplary embodiment;
Fig. 7 is shown according to an exemplary embodiment a kind of for constructing a structural representation of the device of data warehouse
Figure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
It is only to be not intended to be limiting the disclosure merely for for the purpose of describing particular embodiments in the term that the disclosure uses.
The "an" of the singular used in disclosure and the accompanying claims book, " described " and "the" are also intended to including majority
Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps
It may be combined containing one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the disclosure
A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from
In the case where disclosure range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as
One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ...
When " or " in response to determination ".
Fig. 1 is a kind of disclosure method flow diagram for constructing data warehouse shown according to an exemplary embodiment, described
Data warehouse may include one or more theme libraries.Wherein, each theme library can be towards different themes, such as: people,
Several big themes such as ground, thing, object, case.As shown in Figure 1, the method for the building data warehouse may comprise steps of 110-130:
In step 110, theme priority allocation list is set, and the theme priority allocation list is for configuring each specified master
Inscribe priority of the attribute in each specified data source.
It is not that subject heading list is directly generated according to each specified data source when constructing data warehouse in the embodiment of the present disclosure,
But for the ease of Longitudinal Extension and tracing to the source, intermediate list item is first set, such as: theme priority allocation list and theme are traced to the source table,
Subject heading list is obtained by these intermediate list items again.
In one embodiment, the designated key attribute in above-mentioned steps 110 can be from each specified data
What is extracted in source is used to describe the subject attribute in the theme library;The specified data source can be specify it is described for constructing
The data source in theme library.
It in one embodiment, may include for describing in the theme priority allocation list in above-mentioned steps 110
State the first kind field of specified data source, the second class field for describing the designated key attribute and for describing
State the third class field of priority of the designated key attribute in each specified data source.
It in one embodiment, further include reserved field in the theme priority allocation list in above-mentioned steps 110, it is described
Reserved field is the field in the reserved data source, and/or reserved subject attribute for subsequent expansion.
Such as: by taking personnel's theme as an example, each specified data source includes: shown in case table and table 2 shown in following table 1
Permanent resident population's table.
Table 1
| Suspicion personnel ID | Permanent address 1 | Native place 1 | Height 1 | Weight 1 | ||
| 1000001 | A2 | Zhejiang | 1.70 | 68 | ||
| 1000002 | B2 | Guangdong | 1.72 | 60 | ||
| 1000003 | C2 | Henan | 1.80 | 77 | ||
| 1000004 | D2 | Shanghai | 1.55 | 44 |
Table 2
| Personnel ID | Permanent address 2 | Native place 2 | ||||
| 1000001 | A3 | Zhejiang | ||||
| 1000002 | Hubei | |||||
| 1000003 | C3 | Hunan | ||||
| 1000005 | Yunnan |
Each designated key attribute includes: permanent address, native place, height, weight.Also, the theme priority being arranged is matched
Table is set, as described in Table 3.
Table 3
Personnel ID (Identity, identity number) in above-mentioned table 3 is crucial (key) field, it is indicated in source
This field is identical in tables of data, could data integration in multiple data sources to coming together.Also, in addition to closing in above-mentioned table 3
Outside key (key) field, other fields have a corresponding priority, such as: for the permanent address 1 of table 1, priority is
90;For the permanent address 2 of table 2, priority 95, the priority number is bigger, shows that priority is higher.
In above-mentioned table 3, when theme priority allocation list is arranged, the minimum span between each priority of configuration is 5,
Since it takes into account that being easy the new priority of insertion when thering is the data source of similar priority to come in below, i.e., carrying out in advance
Priority is reserved.Certainly, which may be the numerical value greater than 5, such as: 10,100 etc..
In above-mentioned table 3, if specified data source expands, only need to add a new specified data source in table 3 at this time,
And key value is filled in, and fill in corresponding value field and priority, thus expanding data source holds very much in the disclosure
Easily, it is only necessary to allocation list is modified, consequently facilitating realizing the extending transversely of subject heading list.
In the step 120, the priority according to each designated key attribute in each specified data source, determines each finger
Determine tracing to the source for the corresponding subject data of subject attribute and the subject data, obtains theme and trace to the source table.
In the embodiment of the present disclosure, when determining the corresponding subject data of each designated key attribute, generally choose excellent
First grade is higher and effective source data.
In one embodiment, it may include each described for describing that the theme in above-mentioned steps 120, which is traced to the source in table,
4th field of the corresponding subject data of designated key attribute and the 5th word traced to the source for describing the subject data
Section.
In step 130, subject heading list for characterizing theme library is generated according to theme table of tracing to the source.
In the embodiment of the present disclosure, due to that can not include the corresponding subject data of each designated key attribute in subject heading list
Trace to the source, thus according to theme trace to the source table generate the subject heading list for characterizing theme library when, can directly remove each specified master
The topic corresponding subject data of attribute is traced to the source.
As seen from the above-described embodiment, by the way that theme priority allocation list is arranged, the theme priority allocation list is for configuring
Priority of each designated key attribute in each specified data source;According to each designated key attribute in each specified data
Priority in source determines tracing to the source for the corresponding subject data of each designated key attribute and the subject data, obtains theme and trace back
Source table;Table generation is traced to the source for characterizing the subject heading list in theme library, to be conducive to the Longitudinal Extension of data warehouse, cross according to theme
To extending and tracing to the source, the reliability of building data warehouse is improved.
Fig. 2 is a kind of disclosure method flow diagram for constructing data warehouse shown according to an exemplary embodiment, the party
Method can be used on the basis of method shown in Fig. 1, as shown in Fig. 2, may comprise steps of 210- when executing step 120
230:
In step 210, for any designated key attribute, according to the designated key attribute in each specified data source
Priority, select the corresponding specified data source of highest priority.
In a step 220, in the corresponding specified data source of highest priority and the corresponding source of designated key attribute
When data are valid data, then the corresponding source data of designated key attribute is determined as the corresponding theme of designated key attribute
Data, and the corresponding specified data source of highest priority is determined as tracing to the source for the subject data obtain theme and trace to the source table.
In step 230, in the corresponding specified data source of highest priority and the corresponding source of designated key attribute
When data are invalid data, then according to the designated key attribute in the priority in each specified data source, time Gao You is selected
The corresponding specified data source of first grade determines pair until when to inquire the corresponding source data of designated key attribute be valid data
The theme answered is traced to the source table.
In one embodiment, it may include for describing that the theme in above-mentioned steps 220 and step 230, which is traced to the source in table,
4th field of the corresponding subject data of each designated key attribute and for describing tracing to the source for the subject data
5th field.
Such as: by taking personnel's theme as an example, each specified data source includes: shown in case table shown in above-mentioned table 1 and table 2
Permanent resident population's table.Each designated key attribute includes: permanent address, native place, height, weight.Also, the theme priority being arranged
Allocation list, as shown in Table 3 above.And obtained theme is traced to the source table, as described in Table 4.
Table 4
Its process for obtaining table 4 specifically:
(1) it in table 3, according to specified data source and critical field, is specified in data source from these, taking out has same person
The data of member ID, such as the people that personnel ID in Tables 1 and 2 is 1000001;
(2) in table 3, for table 11 this field of permanent address priority be 90, for table 2 permanent address 2 this
The priority of a field is 95;
(3) permanent address of 1000001 personnel ID in Tables 1 and 2 is taken out, according to priority ratio compared with taking-up is high
Priority and effective data fill out the correspondence permanent address field (1000001. permanent addresses=A3) in table 4, while normal in table 4
" table 2 " is firmly write in address source, indicates this field from table 2;
(4) similarly, the permanent address for being 1000002 for personnel ID, since the data in table 2 are empty invalid (i.e. in table 2
Data be invalid data), so the data (data be valid data) i.e. in table 1 for taking out in table 1 fill out it is corresponding in table 4
Permanent address field (1000002. permanent addresses=B2), while " table 1 " is write in 4 permanent address source of table, indicate this word
Section comes from table 1;
(5) and so on all data are all write.
It as seen from the above-described embodiment, can be according to the designated key attribute in each finger for any designated key attribute
Determine the priority in data source, selects the corresponding specified data source of highest priority;In the corresponding specified data of highest priority
In source, the corresponding number of effective sources evidence of the designated key attribute is determined as the corresponding subject data of designated key attribute;It will most
The corresponding specified data source of high priority is determined as tracing to the source for the subject data, obtains theme and traces to the source table, to improve theme
It traces to the source the formation efficiency of table, the theme for also improving generation is traced to the source the practicability of table.
Fig. 3 is a kind of disclosure method flow diagram for constructing data warehouse shown according to an exemplary embodiment, the party
Method can be used on the basis of method shown in Fig. 1, as shown in figure 3, may comprise steps of 310- when executing step 130
330:
In the step 310, it deletes theme to trace to the source the tracing to the source of the subject data for including in table, obtains the interim table of theme.
In step 320, the setting interim table of theme arrives the mapping table of subject heading list, including the interim literary name of theme in the mapping table
Mapping relations between section and theme literary name section.
In one embodiment, the first field data for including in the interim literary name section of the theme in above-mentioned steps 320 is
The each designated key attribute for including in the interim table of theme;The second field data for including in the theme literary name section is institute
It states each designated key attribute for including in subject heading list, includes include each in the mapping relations in the interim table of the theme
The first mapping relations between each designated key attribute for including in designated key attribute and the subject heading list;
The third field data for including in the interim literary name section of theme is include each pre- in the interim table of the theme
Stay subject attribute;The 4th field data for including in the theme literary name section is each reserved theme for including in the subject heading list
Attribute;In the mapping relations include the interim table of the theme in include each reserved subject attribute and the subject heading list in wrap
The second mapping relations between each reserved subject attribute included.
In a step 330, the corresponding subject heading list of the interim table of theme is determined according to mapping table.
Such as: by taking personnel's theme as an example, each specified data source includes: shown in case table shown in above-mentioned table 1 and table 2
Permanent resident population's table.Each designated key attribute includes: permanent address, native place, height, weight.Also, the theme priority being arranged
Allocation list, as shown in Table 3 above.And obtained theme is traced to the source table, as shown in Table 4 above.In addition, the interim table of theme, as follows
It states shown in table 5;The interim table of theme to subject heading list mapping table, as described in Table 6;Personnel's subject heading list as described in Table 7
Table 5
Table 6
Table 7
There is reserved field in above-mentioned table 4 and table 5, write the mapping relations of reserved field Yu true field in table 6 exactly, leads to
The mapping for crossing table 5 and table 6 can be obtained by table 8.In addition, it is extending transversely in order to realize, mapping table is modified, allocation list is enabled;?
That is retaining corresponding reserved field in table 7, corresponding reserved field is enabled in table 5.
As seen from the above-described embodiment, it is traced to the source the tracing to the source of the subject data for including in table by deleting theme, obtains theme and face
When table;The interim table of theme is set to the mapping table of subject heading list, includes the interim literary name section of theme and theme literary name section in the mapping table
Between mapping relations;The corresponding subject heading list of the interim table of theme is determined according to mapping table, thus improve the subject heading list of generation
Accuracy.
Corresponding with the aforementioned building embodiment of the method for data warehouse, the disclosure additionally provides the device of building data warehouse
Embodiment.
As shown in figure 4, Fig. 4 is a kind of disclosure device for constructing data warehouse shown according to an exemplary embodiment
Block diagram, and the method for executing building data warehouse shown in FIG. 1, the data warehouse may include one or more themes
Library.Wherein, each theme library can be towards different themes, such as: people, several big themes such as thing, object, case.Such as Fig. 4 institute
Show, the device of the building data warehouse may include:
Setup module 41 is configured as setting theme priority allocation list, and the theme priority allocation list is for configuring
Priority of each designated key attribute in each specified data source;
Determining module 42 is configured as preferential in each specified data source according to each designated key attribute
Grade determines tracing to the source for each corresponding subject data of the designated key attribute and the subject data, obtains theme and traces to the source table;
Generation module 43, the table that is configured as being traced to the source according to the theme generate the subject heading list for characterizing the theme library.
In one embodiment, it establishes on the basis of device shown in Fig. 4, the designated key attribute is from each finger
Determine the subject attribute for being used to describe the theme library extracted in data source;The specified data source is specified for constructing
State the data source in theme library.
In one embodiment, it establishes on the basis of above-mentioned shown device, includes using in the theme priority allocation list
In the first kind field, the second class field for describing the designated key attribute, Yi Jiyong that describe the specified data source
In the third class field for describing priority of the designated key attribute in each specified data source.
In one embodiment, it establishes on the basis of above-mentioned shown device, further includes in the theme priority allocation list
Reserved field, the reserved field are the fields in the reserved data source, and/or reserved subject attribute for subsequent expansion.
As seen from the above-described embodiment, by the way that theme priority allocation list is arranged, the theme priority allocation list is for configuring
Priority of each designated key attribute in each specified data source;According to each designated key attribute in each specified data
Priority in source determines tracing to the source for the corresponding subject data of each designated key attribute and the subject data, obtains theme and trace back
Source table;Table generation is traced to the source for characterizing the subject heading list in theme library, to be conducive to the Longitudinal Extension of data warehouse, cross according to theme
To extending and tracing to the source, the reliability of building data warehouse is improved.
In one embodiment, it establishes on the basis of device shown in Fig. 4, as shown in figure 5, the determining module 42 can wrap
It includes:
Submodule 51 is chosen, is configured as any designated key attribute, according to the designated key attribute each
Priority in a specified data source selects the corresponding specified data source of highest priority;
First determines submodule 52, is configured as working as in the corresponding specified data source of highest priority and this refers to
Determine the corresponding source data of subject attribute be valid data when, then the corresponding source data of designated key attribute is determined as this and specified
The corresponding subject data of subject attribute, and the corresponding specified data source of highest priority is determined as tracing back for the subject data
Source obtains the theme and traces to the source table;
Second determines submodule 53, is configured as working as in the corresponding specified data source of highest priority and this refers to
Determine the corresponding source data of subject attribute be invalid data when, then it is excellent in each specified data source according to the designated key attribute
In first grade, time corresponding specified data source of high priority is selected, until inquiring the corresponding source number of the designated key attribute
When according to for valid data, determine that the corresponding theme is traced to the source table.
In one embodiment, it establishes on the basis of Fig. 4 or Fig. 5 shown device, it includes being used in table that the theme, which is traced to the source,
4th field of the corresponding subject data of each designated key attribute is described and for describing tracing back for the subject data
5th field in source.
It as seen from the above-described embodiment, can be according to the designated key attribute in each finger for any designated key attribute
Determine the priority in data source, selects the corresponding specified data source of highest priority;In the corresponding specified data of highest priority
In source, the corresponding number of effective sources evidence of the designated key attribute is determined as the corresponding subject data of designated key attribute;It will most
The corresponding specified data source of high priority is determined as tracing to the source for the subject data, obtains theme and traces to the source table, to improve theme
It traces to the source the formation efficiency of table, the theme for also improving generation is traced to the source the practicability of table.
In one embodiment, it establishes on the basis of device shown in Fig. 4, as shown in fig. 6, the generation module 43 can wrap
It includes:
Submodule 61 is deleted, is configured as deleting the theme and traces to the source the tracing to the source of the subject data for including in table, obtain
To the interim table of theme;
Submodule 62 is set, is configured as the setting interim table of theme to the mapping table of subject heading list, includes in the mapping table
Mapping relations between the interim literary name section of theme and theme literary name section;
Third determines submodule 63, is configured as determining the corresponding master of the interim table of the theme according to the mapping table
Inscribe table.
In one embodiment, it establishes on the basis of device shown in Fig. 6, include in the interim literary name section of theme first
Field data is each designated key attribute for including in the interim table of the theme;The second word for including in the theme literary name section
Segment data is each designated key attribute for including in the subject heading list, includes in the interim table of the theme in the mapping relations
Including each designated key attribute and the subject heading list in include each designated key attribute between the first mapping relations;
The third field data for including in the interim literary name section of theme is include each pre- in the interim table of the theme
Stay subject attribute;The 4th field data for including in the theme literary name section is each reserved theme for including in the subject heading list
Attribute;In the mapping relations include the interim table of the theme in include each reserved subject attribute and the subject heading list in wrap
The second mapping relations between each reserved subject attribute included.
As seen from the above-described embodiment, it is traced to the source the tracing to the source of the subject data for including in table by deleting theme, obtains theme and face
When table;The interim table of theme is set to the mapping table of subject heading list, includes the interim literary name section of theme and theme literary name section in the mapping table
Between mapping relations;The corresponding subject heading list of the interim table of theme is determined according to mapping table, thus improve the subject heading list of generation
Accuracy.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein above-mentioned be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
The purpose for needing to select some or all of the modules therein to realize disclosure scheme.Those of ordinary skill in the art are not paying
Out in the case where creative work, it can understand and implement.
The disclosure additionally provides a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, should
Program be executed by processor it is as any into Fig. 4 such as Fig. 2 shown in construct data warehouse method.
The disclosure additionally provides a kind of device for constructing data warehouse, and the data warehouse includes one or more themes
Library, described device include:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Theme priority allocation list is set, and the theme priority allocation list is for configuring each designated key attribute each
Priority in a specified data source;
According to priority of each designated key attribute in each specified data source, each specified master is determined
Topic attribute corresponding subject data and the subject data are traced to the source, and are obtained theme and are traced to the source table;
Subject heading list for characterizing the theme library is generated according to theme table of tracing to the source.
As shown in fig. 7, Fig. 7 is shown according to an exemplary embodiment a kind of for constructing the device 700 of data warehouse
A structural schematic diagram.Referring to Fig. 7, it further comprises one or more processors that device 700, which includes processing component 722, with
And the memory resource as representated by 716, it can be by the instruction of the execution of processing component 722, such as application program for storing.
The application program stored in 716 may include it is one or more each correspond to one group of instruction module.In addition, place
Reason component 722 is configured as executing instruction, to execute the method such as the described in any item building data warehouses of Fig. 2 to Fig. 4.
Device 700 can also include the power management that a power supply module 726 is configured as executive device 700, and one has
Line or radio network interface 750 are configured as device 700 being connected to network and input and output (I/O) interface 758.Dress
Setting 700 can operate based on the operating system for being stored in memory 716, such as Windows ServerTM, Mac OS XTM,
UnixTM, LinuxTM, FreeBSDTM or similar.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure
Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following
Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.
Claims (18)
1. a kind of method for constructing data warehouse, which is characterized in that the data warehouse includes one or more themes library, described
Method includes:
Theme priority allocation list is set, and the theme priority allocation list is for configuring each designated key attribute in each finger
Determine the priority in data source;
According to priority of each designated key attribute in each specified data source, each designated key category is determined
The corresponding subject data of property and the subject data are traced to the source, and are obtained theme and are traced to the source table;
Subject heading list for characterizing the theme library is generated according to theme table of tracing to the source.
2. the method according to claim 1, wherein the designated key attribute is from each specified data
What is extracted in source is used to describe the subject attribute in the theme library;The specified data source is specified for constructing the theme
The data source in library.
3. method according to claim 1 or 2, which is characterized in that include for retouching in the theme priority allocation list
State the first kind field of the specified data source, the second class field for describing the designated key attribute and for retouching
State the third class field of priority of the designated key attribute in each specified data source.
4. according to the method described in claim 3, it is characterized in that, further including pre- write down characters in the theme priority allocation list
Section, the reserved field is the field in the reserved data source, and/or reserved subject attribute for subsequent expansion.
5. the method according to claim 1, wherein it is described according to each designated key attribute in each finger
Determine the priority in data source, determines tracing to the source for the corresponding subject data of each designated key attribute and the subject data, obtain
It traces to the source table to theme, comprising:
For any designated key attribute, according to priority of the designated key attribute in each specified data source, choosing
The corresponding specified data source of highest priority out;
When in the corresponding specified data source of highest priority and the corresponding source data of designated key attribute be significant figure
According to when, then the corresponding source data of designated key attribute is determined as the corresponding subject data of designated key attribute, and will most
The corresponding specified data source of high priority is determined as tracing to the source for the subject data, obtains the theme and traces to the source table;
When in the corresponding specified data source of highest priority and the corresponding source data of designated key attribute be invalid number
According to when, then according to the designated key attribute in the priority in each specified data source, select time corresponding institute of high priority
Specified data source is stated, when inquiring the corresponding source data of designated key attribute is valid data, is determined corresponding described
Theme is traced to the source table.
6. method according to claim 1 or 5, which is characterized in that it includes each for describing that the theme, which is traced to the source in table,
4th field of the corresponding subject data of the designated key attribute and the 5th to trace to the source for describing the subject data
Field.
7. according to the method described in claim 4, it is characterized in that, described trace to the source table generation for characterizing according to the theme
State the subject heading list in theme library, comprising:
It deletes the theme to trace to the source the tracing to the source of the subject data for including in table, obtains the interim table of theme;
The interim table of theme is set to the mapping table of subject heading list, includes the interim literary name section of theme and theme literary name section in the mapping table
Between mapping relations;
The corresponding subject heading list of the interim table of the theme is determined according to the mapping table.
8. the method according to the description of claim 7 is characterized in that the first Field Count for including in the interim literary name section of the theme
According to being each designated key attribute for including in the interim table of the theme;The second field data for including in the theme literary name section
It is each designated key attribute for including in the subject heading list, including in the interim table of the theme in the mapping relations includes
The first mapping relations between each designated key attribute for including in each designated key attribute and the subject heading list;
The third field data for including in the interim literary name section of theme is each reserved master for including in the interim table of the theme
Inscribe attribute;The 4th field data for including in the theme literary name section is each reserved theme category for including in the subject heading list
Property;Include including each reserved subject attribute for including in the interim table of the theme and in the subject heading list in the mapping relations
Each reserved subject attribute between the second mapping relations.
9. a kind of device for constructing data warehouse, which is characterized in that the data warehouse includes one or more themes library, described
Device includes:
Setup module is configured as setting theme priority allocation list, and the theme priority allocation list is for configuring each finger
Determine priority of the subject attribute in each specified data source;
Determining module is configured as the priority according to each designated key attribute in each specified data source, determines
Each corresponding subject data of the designated key attribute and the subject data are traced to the source, and are obtained theme and are traced to the source table;
Generation module, the table that is configured as being traced to the source according to the theme generate the subject heading list for characterizing the theme library.
10. device according to claim 9, which is characterized in that the designated key attribute is from each specified number
According to the subject attribute for being used to describe the theme library extracted in source;The specified data source is specified for constructing the master
The data source of exam pool.
11. device according to claim 9 or 10, which is characterized in that include being used in the theme priority allocation list
It describes the first kind field of the specified data source, the second class field for describing the designated key attribute and is used for
The third class field of priority of the designated key attribute in each specified data source is described.
12. device according to claim 11, which is characterized in that further include pre- write down characters in the theme priority allocation list
Section, the reserved field is the field in the reserved data source, and/or reserved subject attribute for subsequent expansion.
13. device according to claim 9, which is characterized in that the determining module includes:
Submodule is chosen, is configured as any designated key attribute, according to the designated key attribute each specified
Priority in data source selects the corresponding specified data source of highest priority;
First determines submodule, is configured as in the corresponding specified data source of highest priority and the designated key
When the corresponding source data of attribute is valid data, then the corresponding source data of designated key attribute is determined as the designated key category
The corresponding subject data of property, and the corresponding specified data source of highest priority is determined as tracing to the source for the subject data, it obtains
It traces to the source table to the theme;
Second determines submodule, is configured as in the corresponding specified data source of highest priority and the designated key
When the corresponding source data of attribute is invalid data, then the priority according to the designated key attribute in each specified data source
In, time corresponding specified data source of high priority is selected, is up to inquiring the corresponding source data of designated key attribute
When valid data, determine that the corresponding theme is traced to the source table.
14. the device according to claim 9 or 13, which is characterized in that it includes each for describing that the theme, which is traced to the source in table,
4th field of the corresponding subject data of a designated key attribute and to trace to the source for describing the subject data
Five fields.
15. device according to claim 12, which is characterized in that the generation module includes:
Submodule is deleted, is configured as deleting the theme and traces to the source the tracing to the source of the subject data for including in table, obtain theme
Interim table;
Submodule is set, is configured as the setting interim table of theme to the mapping table of subject heading list, faces in the mapping table including theme
When literary name section and theme literary name section between mapping relations;
Third determines submodule, is configured as determining the corresponding subject heading list of the interim table of the theme according to the mapping table.
16. device according to claim 15, which is characterized in that the first field for including in the interim literary name section of theme
Data are each designated key attribute for including in the interim table of the theme;The second Field Count for including in the theme literary name section
According to being each designated key attribute for including in the subject heading list, including in the interim table of the theme in the mapping relations includes
Each designated key attribute and the subject heading list in include each designated key attribute between the first mapping relations;
The third field data for including in the interim literary name section of theme is each reserved master for including in the interim table of the theme
Inscribe attribute;The 4th field data for including in the theme literary name section is each reserved theme category for including in the subject heading list
Property;Include including each reserved subject attribute for including in the interim table of the theme and in the subject heading list in the mapping relations
Each reserved subject attribute between the second mapping relations.
17. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program
The step of any one of claim 1~8 the method is realized when being executed by processor.
18. a kind of device for constructing data warehouse, which is characterized in that the data warehouse includes one or more themes library, institute
Stating device includes:
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Theme priority allocation list is set, and the theme priority allocation list is for configuring each designated key attribute in each finger
Determine the priority in data source;
According to priority of each designated key attribute in each specified data source, each designated key category is determined
The corresponding subject data of property and the subject data are traced to the source, and are obtained theme and are traced to the source table;
Subject heading list for characterizing the theme library is generated according to theme table of tracing to the source.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910563806.4A CN110297818B (en) | 2019-06-26 | 2019-06-26 | Method and device for constructing data warehouse |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910563806.4A CN110297818B (en) | 2019-06-26 | 2019-06-26 | Method and device for constructing data warehouse |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110297818A true CN110297818A (en) | 2019-10-01 |
| CN110297818B CN110297818B (en) | 2022-03-01 |
Family
ID=68029128
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910563806.4A Active CN110297818B (en) | 2019-06-26 | 2019-06-26 | Method and device for constructing data warehouse |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110297818B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111143463A (en) * | 2020-01-06 | 2020-05-12 | 中国工商银行股份有限公司 | Method and device for constructing bank data warehouse based on topic model |
Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040049492A1 (en) * | 2002-09-09 | 2004-03-11 | Lucent Technologies Inc. | Distinct sampling system and a method of distinct sampling for a database |
| CN1975772A (en) * | 2006-12-22 | 2007-06-06 | 中国建设银行股份有限公司 | Method and device for integrating information in multi-system |
| US20080282085A1 (en) * | 2005-12-09 | 2008-11-13 | Eurotech Spa | Method to Search for Affinities Between Subjects and Relative Apparatus |
| US20110004622A1 (en) * | 2007-10-17 | 2011-01-06 | Blazent, Inc. | Method and apparatus for gathering and organizing information pertaining to an entity |
| CN103853820A (en) * | 2014-02-20 | 2014-06-11 | 北京用友政务软件有限公司 | Data processing method and data processing system |
| CN105830053A (en) * | 2014-01-16 | 2016-08-03 | 英特尔公司 | Apparatus, method and system for rapid configuration mechanism |
| CN106294521A (en) * | 2015-06-12 | 2017-01-04 | 交通银行股份有限公司 | Date storage method and data warehouse |
| US20170116307A1 (en) * | 2015-10-23 | 2017-04-27 | Numerify, Inc. | Automated Refinement and Validation of Data Warehouse Star Schemas |
| US20170116306A1 (en) * | 2015-10-23 | 2017-04-27 | Numerify, Inc. | Automated Definition of Data Warehouse Star Schemas |
| CN106933907A (en) * | 2015-12-31 | 2017-07-07 | 北京国双科技有限公司 | The processing method and processing device of tables of data extended counter |
| CN107657049A (en) * | 2017-09-30 | 2018-02-02 | 深圳市华傲数据技术有限公司 | A kind of data processing method based on data warehouse |
| CN107704590A (en) * | 2017-09-30 | 2018-02-16 | 深圳市华傲数据技术有限公司 | A kind of data processing method and system based on data warehouse |
| CN108520008A (en) * | 2018-03-15 | 2018-09-11 | 链家网(北京)科技有限公司 | The construction method and construction device of data warehouse model |
| CN109033173A (en) * | 2018-06-21 | 2018-12-18 | 深圳市彬讯科技有限公司 | It is a kind of for generating the data processing method and device of multidimensional index data |
| CN109145164A (en) * | 2018-08-28 | 2019-01-04 | 百度在线网络技术(北京)有限公司 | Data processing method, device, equipment and medium |
| CN109522312A (en) * | 2018-11-27 | 2019-03-26 | 北京锐安科技有限公司 | A kind of data processing method, device, server and storage medium |
-
2019
- 2019-06-26 CN CN201910563806.4A patent/CN110297818B/en active Active
Patent Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040049492A1 (en) * | 2002-09-09 | 2004-03-11 | Lucent Technologies Inc. | Distinct sampling system and a method of distinct sampling for a database |
| US20080282085A1 (en) * | 2005-12-09 | 2008-11-13 | Eurotech Spa | Method to Search for Affinities Between Subjects and Relative Apparatus |
| CN1975772A (en) * | 2006-12-22 | 2007-06-06 | 中国建设银行股份有限公司 | Method and device for integrating information in multi-system |
| US20110004622A1 (en) * | 2007-10-17 | 2011-01-06 | Blazent, Inc. | Method and apparatus for gathering and organizing information pertaining to an entity |
| CN105830053A (en) * | 2014-01-16 | 2016-08-03 | 英特尔公司 | Apparatus, method and system for rapid configuration mechanism |
| CN103853820A (en) * | 2014-02-20 | 2014-06-11 | 北京用友政务软件有限公司 | Data processing method and data processing system |
| CN106294521A (en) * | 2015-06-12 | 2017-01-04 | 交通银行股份有限公司 | Date storage method and data warehouse |
| US20170116306A1 (en) * | 2015-10-23 | 2017-04-27 | Numerify, Inc. | Automated Definition of Data Warehouse Star Schemas |
| US20170116307A1 (en) * | 2015-10-23 | 2017-04-27 | Numerify, Inc. | Automated Refinement and Validation of Data Warehouse Star Schemas |
| CN106933907A (en) * | 2015-12-31 | 2017-07-07 | 北京国双科技有限公司 | The processing method and processing device of tables of data extended counter |
| CN107657049A (en) * | 2017-09-30 | 2018-02-02 | 深圳市华傲数据技术有限公司 | A kind of data processing method based on data warehouse |
| CN107704590A (en) * | 2017-09-30 | 2018-02-16 | 深圳市华傲数据技术有限公司 | A kind of data processing method and system based on data warehouse |
| CN108520008A (en) * | 2018-03-15 | 2018-09-11 | 链家网(北京)科技有限公司 | The construction method and construction device of data warehouse model |
| CN109033173A (en) * | 2018-06-21 | 2018-12-18 | 深圳市彬讯科技有限公司 | It is a kind of for generating the data processing method and device of multidimensional index data |
| CN109145164A (en) * | 2018-08-28 | 2019-01-04 | 百度在线网络技术(北京)有限公司 | Data processing method, device, equipment and medium |
| CN109522312A (en) * | 2018-11-27 | 2019-03-26 | 北京锐安科技有限公司 | A kind of data processing method, device, server and storage medium |
Non-Patent Citations (3)
| Title |
|---|
| CHAVINKING: "什么是数据仓库主题", 《HTTPS://WWW.CNBLOGS.COM/WCWEN1990/P/7600251.HTML》 * |
| 周世雄: "基于供应链的数据仓库系统在服装行业的应用研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
| 张洪波: "基于供应链的数据仓库系统研究", 《中国优秀硕博士学位论文全文数据库(硕士) 信息科技辑》 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111143463A (en) * | 2020-01-06 | 2020-05-12 | 中国工商银行股份有限公司 | Method and device for constructing bank data warehouse based on topic model |
| CN111143463B (en) * | 2020-01-06 | 2023-07-04 | 中国工商银行股份有限公司 | Construction method and device of bank data warehouse based on topic model |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110297818B (en) | 2022-03-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109829337B (en) | A method, system and device for social network privacy protection | |
| CN106599104B (en) | Massive data association method based on redis cluster | |
| CN110162637B (en) | Information map construction method, device and equipment | |
| TWI652586B (en) | Group search method and device based on social network | |
| CN112528067A (en) | Graph database storage method, graph database reading method, graph database storage device, graph database reading device and graph database reading equipment | |
| CN114253944A (en) | Database bidirectional synchronization method and device and electronic equipment | |
| CN110297818A (en) | Construct the method and device of data warehouse | |
| CN106383826A (en) | Database checking method and apparatus | |
| CN115344550A (en) | Method, device and medium for cloning directories of distributed file system | |
| CN119782590A (en) | Question and answer processing method, device, electronic device and storage medium based on large model | |
| WO2022160443A1 (en) | Lineage mining method and apparatus, electronic device and computer-readable storage medium | |
| CN119376971A (en) | Large model service startup method, device, equipment and medium | |
| CN114610751B (en) | Structured parameter parsing method, device, equipment and medium of geographic computing language | |
| US20240275848A1 (en) | Content initialization method, electronic device and storage medium | |
| US11681545B2 (en) | Reducing complexity of workflow graphs through vertex grouping and contraction | |
| CN117687914A (en) | Test case generation method, device, equipment and medium based on contract file | |
| CN103761617A (en) | Method and system for approval process management in cloud data center | |
| CN110059080B (en) | Data processing method and device | |
| CN114968950A (en) | Task processing method, apparatus, electronic device and medium | |
| CN114996284A (en) | Asynchronous remote replication method, device and medium | |
| CN105516274A (en) | Method and system for realizing general management of SAN based on cloud platform | |
| CN114442959B (en) | Data writing method, device and system of multi-region storage system | |
| CN115034895B (en) | A blockchain node management method, device and electronic device | |
| CN115033823B (en) | Method, apparatus, device, medium, and article for processing data | |
| US20250356836A1 (en) | Joint training |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |