CN107391584B - Faceted Search Method and System Based on Formal Concept Lattice - Google Patents
Faceted Search Method and System Based on Formal Concept Lattice Download PDFInfo
- Publication number
- CN107391584B CN107391584B CN201710483747.0A CN201710483747A CN107391584B CN 107391584 B CN107391584 B CN 107391584B CN 201710483747 A CN201710483747 A CN 201710483747A CN 107391584 B CN107391584 B CN 107391584B
- Authority
- CN
- China
- Prior art keywords
- concept
- lattice
- formal
- concepts
- concept lattice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24575—Query processing with adaptation to user needs using context
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明涉及计算机搜索技术领域,公开一种基于形式概念格的分面搜索方法及系统,以基于形式概念分析理论针对原始信息预构造形式概念格,并在形式概念格的基础上建立索引,从而建立分面搜索系统。本发明方法包括:构造形式概念格,在构造过程中,形式概念格使用伪造属性fake_attribute初始化最小概念Bottom;在形式概念格中建立叶子概念的索引;在查询过程中,概念的内涵对应查询语句,概念的外延对应结果集;在获取用户查询的分面值集后,在所构造的形式概念格中利用叶子概念自底向上匹配包含属性集的最小内涵以找到与分面值集对应的目标概念,并返回该目标概念所对应的外延。
The invention relates to the technical field of computer search, and discloses a faceted search method and system based on a formal concept lattice, which pre-constructs a formal concept lattice for original information based on a formal concept analysis theory, and establishes an index on the basis of the formal concept lattice, thereby Build a faceted search system. The method of the invention includes: constructing a formal concept lattice, and in the construction process, the formal concept lattice uses the fake attribute fake_attribute to initialize the minimum concept Bottom; establishing an index of the leaf concept in the formal concept lattice; in the query process, the connotation of the concept corresponds to the query statement, The extension of the concept corresponds to the result set; after obtaining the facet value set of the user query, the leaf concept is used in the constructed formal concept lattice to match the minimum connotation of the attribute set from the bottom up to find the target concept corresponding to the facet value set, and Returns the extension corresponding to the target concept.
Description
技术领域technical field
本发明涉及计算机搜索技术领域,尤其涉及一种基于形式概念格的分面搜索方法及系统。The invention relates to the technical field of computer search, in particular to a faceted search method and system based on a formal concept lattice.
背景技术Background technique
分面搜索(Faceted Search)是一种在关键词搜索的基础上,根据当前搜索结果提供上下文相关的分面信息(Facet Information)的信息检索技术。用户可以脱离系统设计人员既定的类别树,从各种各样的角度自定义感兴趣的类别。在用户指定某个分面值(Facet Value)后,系统根据该分面值对应的结果集中动态获取更细化的信息,新的细化结果能够从多个分面对结果集进行划分,帮助用户进一步了解他们感兴趣的数据信息。在整个搜索过程中,用户可以灵活地切换分面值,从而快速获取相关内容。Faceted Search is an information retrieval technology that provides context-related facet information according to the current search results on the basis of keyword search. Users can customize the category of interest from various perspectives, breaking away from the category tree established by the system designer. After the user specifies a certain facet value, the system dynamically obtains more refined information according to the result set corresponding to the facet value. The new refined result can divide the result set from multiple facets, helping the user to further Learn about the data they are interested in. Throughout the search process, users can flexibly switch facets to quickly get relevant content.
形式概念分析(Formal Concept Analysis)理论是一种针对结构化数据进行知识挖掘与分析的方法,被广泛应用于知识发现、软件工程等领域。形式概念分析的核心数据结构即为形式概念格,概念格通过哈斯图(Hasse Diagram)来表示概念与概念之间的层次结构。Formal Concept Analysis (Formal Concept Analysis) theory is a method of knowledge mining and analysis for structured data, which is widely used in knowledge discovery, software engineering and other fields. The core data structure of formal concept analysis is the formal concept lattice, and the concept lattice represents the hierarchical structure between concepts through Hasse Diagram.
目前流行的分面搜索技术主要建立在传统的关系型数据库之上,要快速检索这些内容并提供相应的分面信息是一个亟待解决的难题。The current popular faceted search technology is mainly based on the traditional relational database. It is an urgent problem to quickly retrieve these contents and provide the corresponding faceted information.
发明内容SUMMARY OF THE INVENTION
本发明目的在于公开一种基于形式概念格的分面搜索方法及系统,以基于形式概念分析理论针对原始信息预构造形式概念格,并在形式概念格的基础上建立索引,从而建立分面搜索系统。The purpose of the present invention is to disclose a faceted search method and system based on a formal concept lattice, which pre-constructs a formal concept lattice for original information based on the formal concept analysis theory, and establishes an index on the basis of the formal concept lattice, thereby establishing a faceted search system.
为实现上述目的,本发明公开了一种基于形式概念格的分面搜索方法,包括:In order to achieve the above object, the present invention discloses a faceted search method based on formal concept lattice, including:
构造形式概念格,所述形式概念格使用伪造属性fake_attribute初始化最小概念Bottom,并在每次更新概念格结构时,将输入对象Obj的属性集添加到Bottom的内涵中,最后再将fake_attribute剔除得到完整且正确的形式概念格;同时,在形式概念格中增加新对象的过程中,在输入属性集Y后,首先获取Y所唯一对应的标准生成器,如果概念格已有概念的内涵与Y相等,则Y不产生新概念,并将该新对象添加到其相等内涵所关联概念的外延中,以及将该新对象添加到此概念所有父概念的外延中;如果概念格不存在已有概念的内涵与Y相等,则创建新概念,并根据标准生成器的直接父概念计算新概念的所有候选直接父概念并筛选出真正的直接父概念,然后更新概念之间的父子关系,同时将该新对象添加到新概念所有父概念的外延中;Construct a formal concept lattice, the formal concept lattice uses the fake attribute fake_attribute to initialize the minimum concept Bottom, and each time the concept lattice structure is updated, the attribute set of the input object Obj is added to the content of Bottom, and finally the fake_attribute is eliminated to obtain a complete and the correct formal concept lattice; at the same time, in the process of adding new objects to the formal concept lattice, after inputting the attribute set Y, first obtain the standard generator uniquely corresponding to Y, if the connotation of the existing concept in the concept lattice is equal to Y , then Y does not generate a new concept, and adds the new object to the extension of the concept associated with its equivalent intension, and adds the new object to the extension of all parent concepts of this concept; if there is no existing concept in the concept lattice The connotation is equal to Y, then a new concept is created, and all candidate direct parent concepts of the new concept are calculated according to the direct parent concepts of the standard generator and the real direct parent concepts are filtered out, and then the parent-child relationship between the concepts is updated, and the new concept is The object is added to the extension of all parent concepts of the new concept;
在所述形式概念格中建立叶子概念的索引,所述叶子概念是指形式概念格中最小概念的直接父概念;establishing an index of leaf concepts in the formal concept lattice, where the leaf concepts refer to the direct parent concept of the smallest concept in the formal concept lattice;
在查询过程中,概念的内涵对应查询语句,概念的外延对应结果集;在获取用户查询的分面值集后,在所构造的形式概念格中利用叶子概念自底向上匹配包含属性集的最小内涵以找到与所述分面值集对应的目标概念,并返回该目标概念所对应的外延。In the query process, the connotation of the concept corresponds to the query statement, and the extension of the concept corresponds to the result set; after obtaining the facet value set of the user query, the leaf concept is used in the constructed formal concept lattice to match the minimum connotation of the attribute set from bottom to top to find the target concept corresponding to the set of facets, and return the extension corresponding to the target concept.
与上述方法相对应的,本发明还公开一种基于形式概念格的分面搜索系统,包括:Corresponding to the above method, the present invention also discloses a faceted search system based on formal concept lattice, including:
子系统一,用于构造形式概念格,所述形式概念格使用伪造属性fake_attribute初始化最小概念Bottom,并在每次更新概念格结构时,将输入对象Obj的属性集添加到Bottom的内涵中,最后再将fake_attribute剔除得到完整且正确的形式概念格;同时,在形式概念格中增加新对象的过程中,在输入属性集Y后,首先获取Y所唯一对应的标准生成器,如果概念格已有概念的内涵与Y相等,则Y不产生新概念,并将该新对象添加到其相等内涵所关联概念的外延中,以及将该新对象添加到此概念所有父概念的外延中;如果概念格不存在已有概念的内涵与Y相等,则创建新概念,并根据标准生成器的直接父概念计算新概念的所有候选直接父概念并筛选出真正的直接父概念,然后更新概念之间的父子关系,同时将该新对象添加到新概念所有父概念的外延中;
其中,所述子系统一还用于在所述形式概念格中建立叶子概念的索引,所述叶子概念是指形式概念格中最小概念的直接父概念;Wherein, the
子系统二,用于在查询过程中,概念的内涵对应查询语句,概念的外延对应结果集;在获取用户查询的分面值集后,在所构造的形式概念格中利用叶子概念自底向上匹配包含属性集的最小内涵以找到与所述分面值集对应的目标概念,并返回该目标概念所对应的外延。
本发明具有以下有益效果:The present invention has the following beneficial effects:
一方面,在构造形式概念格的过程中,动态更新最小概念,而避免提前载入完整的形式背景,提高了算法的灵活性。On the one hand, in the process of constructing the formal concept lattice, the minimum concept is dynamically updated, and the complete formal background is avoided to be loaded in advance, which improves the flexibility of the algorithm.
又一方面,每添加一个对象都会在原有概念格的基础上基于标准生成器进行更新操作以得到新的概念格,实现简单、快捷、可靠,更进一步地,还可以通过对概念的内涵进行哈希值计算作为索引,将所有概念保存在哈希表中,以供在确定标准生成器的过程中,结合所计算的属性集Y的哈希值自底向上搜索标准生成器以有效提高标准生成器的查询速度,避免冗余计算。On the other hand, each time an object is added, an update operation will be performed based on the standard generator on the basis of the original concept lattice to obtain a new concept lattice, which is simple, fast and reliable. The hash value calculation is used as an index, and all concepts are stored in the hash table, so that in the process of determining the standard generator, the standard generator is searched bottom-up in combination with the hash value of the calculated attribute set Y to effectively improve the standard generation. The query speed of the server is avoided, and redundant computation is avoided.
再一方面,本发明还将查询条件解析成属性集合,利用叶子概念自底向上匹配包含属性集的最小内涵,可以快速排除不可能的线路,找到目标概念。On the other hand, the present invention also parses the query conditions into attribute sets, and uses the leaf concept to match the minimum connotation including the attribute set from bottom to top, so that impossible routes can be quickly eliminated and the target concept can be found.
此外,本发明于形式概念格的分面搜索技术能够实现在用户进行检索之前预先计算结果集的上下文关系及分面信息,或者在检索的同时快速计算出结果对应的分面信息,缩短了系统响应时间,提升了用户体验同时降低了用户浏览成本。In addition, the faceted search technology of the present invention based on the formal concept lattice can realize the pre-calculation of the context relationship and facet information of the result set before the user searches, or to quickly calculate the facet information corresponding to the result at the same time of retrieval, which shortens the system time. The response time improves the user experience and reduces the user browsing cost.
下面将参照附图,对本发明作进一步详细的说明。The present invention will be described in further detail below with reference to the accompanying drawings.
附图说明Description of drawings
构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings constituting a part of the present application are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:
图1本发明实施例公开的一种形式概念格的示例图;1 is an example diagram of a formal concept lattice disclosed in an embodiment of the present invention;
图2本发明实施例公开的基于形式概念格的分面搜索方法流程图;2 is a flowchart of a faceted search method based on a formal concept lattice disclosed in an embodiment of the present invention;
图3为本发明实施例公开的一种形式概念格构造过程示意图。FIG. 3 is a schematic diagram of a construction process of a formal concept lattice disclosed in an embodiment of the present invention.
具体实施方式Detailed ways
为便于本领域技术人员对本发明做充分理解,对本发明相关术语、定理及推论详述如下:For the convenience of those skilled in the art to fully understand the present invention, the relevant terms, theorems and inferences of the present invention are described in detail as follows:
定义1.1:设U是对象的集合,M是属性的集合,I是两个集合U和M间的关系,则称三元组为一个形式背景(简称背景)。(u,m)∈I(或写作uIm)表示对象u具有属性m。背景可以用一个矩形的表来表示,它的每一行是一个对象,每一列是一个属性。若u行m列的交叉处是×,则表示对象u具有属性m。如表1给出示例。Definition 1.1: Let U be a set of objects, M be a set of attributes, and I be the relationship between two sets U and M, then it is called a triple For a form background (referred to as background). (u, m) ∈ I (or written as uIm) means that object u has attribute m. The background can be represented by a rectangular table, where each row is an object and each column is an attribute. If the intersection of u row and m column is ×, it means that object u has attribute m. Examples are given in Table 1.
表1示例背景 Table 1 Example Background
定义1.2:设是一个背景,若令Definition 1.2: Let is a background, if make
及and
如果A,B满足f(A)=B,g(B)=A则称二元组(A,B)是一个概念(Concept)。A是概念(A,B)的外延(Extent),B是概念(A,B)的内涵(Intent)。用或表示背景上的所有概念的集合。If A and B satisfy f(A)=B, g(B)=A, then the binary group (A, B) is called a concept (Concept). A is the extension (Extent) of the concept (A, B), and B is the connotation (Intent) of the concept (A, B). use or show background Collection of all concepts on .
推论1.1:对于任意A是一个外延当且仅当A=g(f(A));对于任意B是一个内涵当且仅当B=f(g(B))。Corollary 1.1: For any A is an extension if and only if A=g(f(A)); for any B is an intension if and only if B=f(g(B)).
证明:(1)充分性:对于任意存在二元组(A,f(A)),明显有f(A)=f(A),又因为g(f(A))=A,所以二元组(A,f(A))是一个概念(定义2.2),则A是一个外延。Proof: (1) Sufficient: For any There is a dyad (A, f(A)), obviously f(A)=f(A), and because g(f(A))=A, the dyad (A, f(A)) is a concept (definition 2.2), then A is an extension.
(2)必要性:因为A是一个外延,所以二元组(A,f(A))是一个概念,则A=g(f(A))。(2) Necessity: Since A is an extension, the binary group (A, f(A)) is a concept, then A=g(f(A)).
这里证明外延部分,对于内涵同理可证。The extension part is proved here, and the same can be proved for the intension.
定理2.1:设(U,M,I)是一个背景,A,A1,A2∈U,B,B1,B2∈M,则有:Theorem 2.1: Let (U, M, I) be a background, A, A 1 , A 2 ∈ U, B, B 1 , B 2 ∈ M, then we have:
一、one,
(1) (1)
(2) (2)
(3)f(A)=f(g(f(A)))(3) f(A)=f(g(f(A)))
(4)f(A1)∩f(A2)=f(A1∪A2)(4)f(A 1 )∩f(A 2 )=f(A 1 ∪A 2 )
(5) (5)
(6)g(f(A))=g(f(g(f(A))))(6) g(f(A))=g(f(g(f(A))))
(7) (7)
二、two,
(1`) (1`)
(2`) (2`)
(3`)g(B)=g(f(g(B)))(3`)g(B)=g(f(g(B)))
(4`)g(B1)∩g(B2)=g(B1UB2)(4`)g(B 1 )∩g(B 2 )=g(B 1 UB 2 )
(5`) (5`)
(6`)f(g(B))=f(g(f(g(B))))(6`)f(g(B))=f(g(f(g(B))))
定理2.2:对任何U的子集A,f(A)一定是内涵,因为形如(g(f(A)),f(A))一定是概念;同样有对任何M的子集B,g(B)一定是外延,因为(g(B),f(g(B))一定是概念。Theorem 2.2: For any subset A of U, f(A) must be an intension, because the form (g(f(A)), f(A)) must be a concept; also for any subset B of M, g(B) must be an extension, because (g(B), f(g(B)) must be a concept.
推论2.2:内涵的交仍是内涵,外延的交仍是外延。Corollary 2.2: The intersection of connotation is still connotation, and the intersection of extension is still extension.
证明:对于任意两个概念(A1,B1),(A2,B2),根据定理2.1(4)、2.1(4`)得B1∩B2=f(A1)∩f(A2)=f(A1∪A2),根据定理2.2对于f(A1∪A2)一定是内涵,同理有A1∩A2=g(B1)∩g(B2)=g(B1∪B2),易证g(B1∪B2)一定是外延。证毕。Proof: For any two concepts (A 1 , B 1 ), (A 2 , B 2 ), according to Theorems 2.1(4) and 2.1(4`), B 1 ∩B 2 =f(A 1 )∩f( A 2 )=f(A 1 ∪A 2 ), according to Theorem 2.2 for f(A 1 ∪A 2 ) must be the connotation. Similarly, A 1 ∩A 2 =g(B 1 )∩g(B 2 )=g(B 1 ∪B 2 ), and it is easy to prove that g(B 1 ∪B 2 ) must be epitaxy. Certificate completed.
定义2.3:若(A1,B1),(A2,B2)是某个背景上的两个概念。而且(由定理2.1(1),易知等价于),则我们称(A1,B1)是(A2,B2)的子概念,(A2,B2)是(A1,B1)的父概念,并记作(A1,B1)≤(A2,B2)。如果且不存在概念(A3,B3)使得则称(A1,B1)是(A2,B2)的直接子概念,(A2,B2)是(A1,B1)的直接父概念,记作(A1,B1)<(A2,B2)。由此得到一个半序集其中表示(U,M,I)的所有概念。Definition 2.3: If (A 1 , B 1 ), (A 2 , B 2 ) is a background on the two concepts. and (From Theorem 2.1(1), it is easy to know that it is equivalent to ), then we call (A 1 , B 1 ) the child concept of (A 2 , B 2 ), and (A 2 , B 2 ) the parent concept of (A 1 , B 1 ), and denoted (A 1 , B 1 ), B 1 )≤(A 2 , B 2 ). if and there is no concept (A 3 , B 3 ) such that Then (A 1 , B 1 ) is said to be the direct child concept of (A 2 , B 2 ), and (A 2 , B 2 ) is the direct parent concept of (A 1 , B 1 ), denoted as (A 1 , B 1 ) )<(A 2 , B 2 ). This results in a semi-ordered set in Represents all concepts of (U, M, I).
上述定义2.3中得到的半序集被称为形式概念格。其中令 则称图(V,E)为形式概念格的哈斯图。根据形式背景示例表1可以画出其对应形式概念格如图1。The semi-ordered set obtained in the above definition 2.3 is called the formal concept lattice. which order Then the graph (V, E) is called a Haas diagram of the formal concept lattice. According to the form background example Table 1, the corresponding form concept grid can be drawn as shown in Figure 1.
定义2.4:形式概念格中直接子概念是形式概念格中最小概念的概念被称为叶子概念。叶子概念有且仅有最小概念作为子概念。例如图1中,概念(1,abef)和概念(4,abcdf)是叶子概念。Definition 2.4: Concepts whose direct subconcepts in the formal concept lattice are the smallest concepts in the formal concept lattice are called leaf concepts. Leaf concepts have and only minimal concepts as subconcepts. For example, in Figure 1, the concept (1, abef) and the concept (4, abcdf) are leaf concepts.
定义2.5:在形式概念格中,某一属性m的支持数等于内涵蕴含该属性的最大概念的外延的集合基数;某一概念的支持数等于其外延的集合基数。如图1中,属性f的支持数为4,概念(1,abef)的支持数为1。Definition 2.5: In the formal concept lattice, the support number of an attribute m is equal to the extension set cardinality of the largest concept that contains the attribute; the support number of a concept is equal to its extension set cardinality. As shown in Figure 1, the support number of attribute f is 4, and the support number of concept (1, abef) is 1.
定义3.1:设L是对应形式背景形式概念格,在形式背景中加入一个新对象o后得到新的形式背景所对应的形式概念格为L,其中用f`(o)表示o所拥有的属性。令(A,B)为L`中任意一个概念,则Definition 3.1: Let L be the corresponding formal context Form concept grid, on form background After adding a new object o to get a new form background The corresponding formal concept lattice is L, where f`(o) is used to denote the attributes possessed by o. Let (A, B) be any concept in L`, then
(1)(A,B)是一个新概念当且仅当B不是L中任何概念的内涵。(1) (A, B) is a new concept if and only if B is not an intension of any concept in L.
(2)(A,B)是一个改良(Modified)概念当且仅当 (2) (A, B) is a Modified concept if and only if
(3)(A,B)如果在L中已存在,则称之为旧概念。(3) (A, B) is called the old concept if it already exists in L.
(4)对于任意一个新概念(X,Y),如果B∩f`(o)=Y≠B,则称概念(A,B)为(X,Y)的一个生成器。最小概念是任意其他概念的生成器。(4) For any new concept (X, Y), if B∩f`(o)=Y≠B, then the concept (A,B) is called a generator of (X,Y). A minimal concept is a generator of any other concept.
(5)一般情况下,一个新概念(X,Y)会有多个生成器(至少有一个生成器),其中标准生成器是其他所有生成器的父概念,即为所有生成器的上确界。(5) In general, a new concept (X, Y) will have multiple generators (at least one generator), of which the standard generator is the parent concept of all other generators, that is, the upper limit of all generators boundary.
推论3.1:对于任意新概念(X,Y)有且仅有一个标准生成器。证明:Corollary 3.1: There is one and only one standard generator for any new concept (X, Y). prove:
(1)如果(X,Y)只有一个生成器,则其为标准生成器。(1) If (X, Y) has only one generator, it is a standard generator.
(2)如果(X,Y)有多个标准生成器,这与定义相违背,不必讨论。(2) If (X, Y) has multiple standard generators, this is against the definition and need not be discussed.
(3)如果不存在生成器,则有反证。假设这些生成器中有两个极大概念(A1,B1),(A2,B2),则有那么根据推论2.2,B1∩B2一定是一个内涵,那么一定存在概念(g(B1∩B2),B1∩B2)也是(X,Y)的一个生成器,因为这与(A1,B1),(A2,B2)是极大概念相矛盾。故该假设不成立。(3) If there is no generator, there is disproven. Assuming that there are two maximal concepts (A 1 , B 1 ), (A 2 , B 2 ) in these generators, we have So According to Corollary 2.2, B 1 ∩ B 2 must be an intension, then there must be a concept (g(B 1 ∩ B 2 ), B 1 ∩ B 2 ) is also a generator of (X, Y), because This contradicts the concept that (A 1 , B 1 ), (A 2 , B 2 ) are maximal. Therefore, this assumption does not hold.
综上,(X,Y)有且仅有一个标准生成器。To sum up, (X, Y) has one and only one standard generator.
标准生成器是一个很重要的概念,在算法构造概念格的过程中,首先需要找到新概念(X,Y)的标准生成器。实际上标准生成器是当前生成概念格中新概念的直接子概念。那么新概念又是从何而来呢?我们知道在一个已知形式概念格L中添加一个新的对象o可以得到新的形式概念格L`,说明o是导致L更新的因素。对此有如下结论:The standard generator is a very important concept. In the process of constructing the concept lattice, the standard generator of the new concept (X, Y) needs to be found first. In fact standard generators are direct subconcepts of new concepts in the current generative concept lattice. So where did the new concept come from? We know that adding a new object o to a known formal concept lattice L can get a new formal concept lattice L`, indicating that o is the factor that causes L to update. Here are the conclusions:
推论3.2:根据定义3.1在概念格L加入新对象o,则有:Corollary 3.2: According to the definition 3.1, adding a new object o to the concept lattice L, we have:
(1)对象o所对应的属性集f`(o)在新概念格L`中一定是一个内涵。如果f`(o)不是L中任何概念的内涵,则(g`(f`(o)),f`(o))是一个新概念;否则,L`中没有新概念。(1) The attribute set f`(o) corresponding to the object o must be a connotation in the new conceptual lattice L`. If f`(o) is not the intension of any concept in L, then (g`(f`(o)), f`(o)) is a new concept; otherwise, there is no new concept in L`.
(2)如果产生了新概念(A,B),L中所有内涵与B的非空交集也有可能产生新内涵(推论2.2),由此产生更多新概念。(2) If a new concept (A, B) is produced, the non-empty intersection of all intensions in L and B may also produce new intensions (Corollary 2.2), thereby producing more new concepts.
(3)新概念的直接子概念是其标准生成器。(3) The direct child concept of a new concept is its standard generator.
(4)新概念的直接父概念可能是标准生成器的直接父概念,或是新概念与标准生成器的直接父概念内涵交集所对应的概念。(4) The direct parent concept of the new concept may be the direct parent concept of the standard generator, or the concept corresponding to the intersection of the connotation of the new concept and the direct parent concept of the standard generator.
(5)对象o的加入一定会导致改良概念的产生,具体而言(g`(f`(o)),f`(o))的所有父概念都需要将对象o加入其外延。(5) The addition of object o will definitely lead to the generation of improved concepts. Specifically, all parent concepts of (g`(f`(o)), f`(o)) need to add object o to its extension.
综上,形式背景由三元组(U,M,I)表示,其中U表示对象的集合,M表示属性集合,I表示对象和属性之间的关系。分面搜索中也有对象和属性的概念,并通过分面对属性进行了划分,这是分面搜索与形式概念分析的共通点。如果用F表示分面的集合,FV表示分面对应的所有分面值,则可以使用二元组(F,FV)表示属性集M。这样就得到了形式背景的新的表达式(U,(F,FV),I)或者四元组(U,F,M,I)。根据形式背景(U,F,M,I)可以得到相应的形式概念格其中表示所有的概念。新增的分面丰富了属性的语义,同时对于形式概念格原有的结构没有影响。分面搜索与形式概念分析在逻辑概念上有良好的契合度,形式概念分析可以对分面搜索的所有操作,包括语义相关的操作提供支持。To sum up, the formal context is represented by a triple (U, M, I), where U represents a set of objects, M represents a set of attributes, and I represents the relationship between objects and attributes. There are also concepts of objects and attributes in faceted search, and attributes are divided by faceting, which is the common point of faceted search and formal concept analysis. If F represents the set of facets, and FV represents all facet values corresponding to the facets, the attribute set M can be represented by a two-tuple (F, FV). This results in a new expression (U, (F, FV), I) or quadruple (U, F, M, I) of the formal context. According to the formal background (U, F, M, I), the corresponding formal concept lattice can be obtained in represent all concepts. The new facets enrich the semantics of attributes without affecting the original structure of the formal concept lattice. Faceted search and formal concept analysis have a good fit in logical concepts, and formal concept analysis can provide support for all operations of faceted search, including semantically related operations.
根据以上定义、定理及推论,以下结合附图对本发明的实施例进行详细说明,但是本发明可以由权利要求限定和覆盖的多种不同方式实施。Based on the above definitions, theorems and inferences, the embodiments of the present invention are described in detail below with reference to the accompanying drawings, but the present invention can be implemented in many different ways as defined and covered by the claims.
实施例1Example 1
本实施例公开一种基于形式概念格的分面搜索方法,如图2所示,包括:This embodiment discloses a faceted search method based on formal concept lattice, as shown in FIG. 2 , including:
步骤S1、构造形式概念格,该形式概念格使用伪造属性fake_attribute初始化最小概念Bottom,并在每次更新概念格结构时,将输入对象Obj的属性集添加到Bottom的内涵中,最后再将fake_attribute剔除得到完整且正确的形式概念格;同时,在形式概念格中增加新对象的过程中,在输入属性集Y后,首先获取Y所唯一对应的标准生成器,如果概念格已有概念的内涵与Y相等,则Y不产生新概念,并将该新对象添加到其相等内涵所关联概念的外延中,以及将该新对象添加到此概念所有父概念的外延中;如果概念格不存在已有概念的内涵与Y相等,则创建新概念,并根据标准生成器的直接父概念计算新概念的所有候选直接父概念并筛选出真正的直接父概念,然后更新概念之间的父子关系,同时将该新对象添加到新概念所有父概念的外延中。Step S1, construct a formal concept lattice, which uses the fake attribute fake_attribute to initialize the minimum concept Bottom, and each time the concept lattice structure is updated, the attribute set of the input object Obj is added to the connotation of Bottom, and finally the fake_attribute is eliminated. Obtain a complete and correct formal concept lattice; at the same time, in the process of adding new objects to the formal concept lattice, after inputting the attribute set Y, first obtain the standard generator uniquely corresponding to Y, if the concept lattice has the connotation of the concept and If Y is equal, then Y does not generate a new concept, and the new object is added to the extension of the concept associated with its equal intension, and the new object is added to the extension of all parent concepts of this concept; if the concept lattice does not exist already The connotation of the concept is equal to Y, then create a new concept, and calculate all the candidate direct parent concepts of the new concept according to the direct parent concepts of the standard generator and filter out the real direct parent concepts, then update the parent-child relationship between the concepts, and at the same time, This new object is added to the extension of all parent concepts of the new concept.
在该步骤中,形式概念格可采用静态构造模式、动态构造模式或基于构造时间预测的混合式构造模式;其中,混合式构造模式是根据分面值集预测概念构造耗时,如果小于预设的时间阈值,则采用动态模式构造形式概念格,若大于预设的时间阈值,则采用静态模式构造形式概念格。In this step, the formal concept lattice can adopt a static construction mode, a dynamic construction mode or a hybrid construction mode based on construction time prediction; wherein, the hybrid construction mode is based on the face value set prediction concept construction time, if less than a preset If the time threshold is greater than the preset time threshold, the dynamic mode is used to construct the formal concept grid. If the time threshold is greater than the preset time threshold, the formal concept grid is constructed using the static mode.
与上述步骤相对的,现有的最小概念需要获知形式背景(U,M,I)中所有的属性组成的集合M,必须预先载入整个背景,导致闲置的背景数据浪费了有限的内存资源,数据量过于庞大甚至会导致算法无法正常运行。In contrast to the above steps, the existing minimal concept needs to know the set M composed of all attributes in the formal background (U, M, I), and the entire background must be preloaded, resulting in idle background data wasting limited memory resources, Too much data can even cause algorithms to fail to function properly.
为了方便表达构造过程,本实施例使用[对象编号,对象属性集]表示对象实体,二元组(外延,内涵)表示概念,简略形式表示集合,如abc={a,b,c},123={1,2,3}。如图3所示,展示了向已经加入对象[1,abef]和对象[2,bcf]的概念格中加入对象[3,cdf]的过程,省略了外延更新的过程。图中实心圆表示预构造的最小概念(伪造属性被省略),虚线圆表示新构造的概念,实线圆表示概念格中旧概念。图3(a)表示未加入对象[3,cdf]的概念格结构。图3(b)表示加入了对象[3,cdf]生成了新概念(3,cdf),其对应的标准生成器为更新内涵后的最小概念此时还未计算新概念的直接父概念。图3(c)表示根据标准生成器的直接父概念(1,abef)计算出概念(3,cdf)的一个候选直接父概念为新概念(123,f),而(123,f)的直接子概念为其标准生成器(12,bf)。图3(d)表示根据的直接父概念(2,bcf)计算出(3,cdf)的另一个候选直接父概念为新概念(23,cf)。这里要说明的是(23,cf)的标准生成器(2,bcf)还有其他直接父概念(12,bf),所以还需要根据(12,cf)为新概念(23,cf)计算其直接父概念,恰好是已存在的(123,f),故这个步骤没有新概念生成。最后在(3,cdf)的候选直接父概念中选出真正的直接父概念(23,cf),更新概念之间父子关系,概念格构造完成。In order to express the construction process conveniently, this embodiment uses [object number, object attribute set] to represent the object entity, two-tuple (extension, connotation) represents the concept, and the abbreviated form represents the set, such as abc={a, b, c}, 123 ={1, 2, 3}. As shown in Figure 3, it shows the process of adding object [3, cdf] to the concept lattice that has added object [1, abef] and object [2, bcf], omitting the process of extension update. The solid circles in the figure represent the pre-constructed minimal concepts (forgery attributes are omitted), the dashed circles represent the newly constructed concepts, and the solid circles represent the old concepts in the concept grid. Figure 3(a) shows the concept lattice structure without adding the object [3, cdf]. Figure 3(b) shows that the object [3, cdf] is added to generate a new concept (3, cdf), and the corresponding standard generator is the minimal concept after updating the connotation The immediate parent concept of the new concept has not been calculated at this time. Figure 3(c) shows that a candidate direct parent concept of concept (3, cdf) is calculated according to the direct parent concept (1, abef) of the standard generator as the new concept (123, f), and the direct parent concept of (123, f) is The subconcept is its standard generator (12, bf). Figure 3(d) shows that according to The direct parent concept of (2, bcf) computes another candidate direct parent concept of (3, cdf) as the new concept (23, cf). What is to be explained here is that the standard generator (2, bcf) of (23, cf) has other direct parent concepts (12, bf), so it is also necessary to calculate its new concept (23, cf) according to (12, cf) The immediate parent concept happens to be the existing (123, f), so no new concept is generated in this step. Finally, the real direct parent concept (23, cf) is selected from the candidate direct parent concepts of (3, cdf), the parent-child relationship between concepts is updated, and the concept lattice construction is completed.
可选的,本实施例以改进的AddIntent算法实现上述对象添加的伪代码如下表2:Optionally, this embodiment uses the improved AddIntent algorithm to implement the pseudocode of the above object addition as shown in Table 2 below:
表2:Table 2:
如表2所示,函数AddIntent有三个参数:属性集合Y,Y对应的一个生成器canonical_generator,形式概念格L。在输入属性集Y后,算法第1行至第4行首先获取Y所对应的标准生成器,如果概念格L已有概念的内涵等于Y,则Y不会产生新概念,直接返回找到的概念,算法结束;否则,算法第5至第14行进一步根据标准生成器的直接父概念计算新概念的所有候选直接父概念;第15行至第16行生成一个新概念;第17行至第30行排除候选直接父概念中不符合要求的概念,然后删除可能存在的旧关联,设置新概念与直接父概念的关联;第31行至第32行更新新概念与标准生成器之间的父子关系并返回新概念,算法结束。其中,调用的函数GetCanonicalGenerator用来获取Y相应的标准生成器。As shown in Table 2, the function AddIntent has three parameters: attribute set Y, a generator corresponding to Y, canonical_generator, and formal concept lattice L. After inputting the attribute set Y, the first line to the fourth line of the algorithm first obtains the standard generator corresponding to Y. If the connotation of the existing concept in the concept lattice L is equal to Y, then Y will not generate a new concept, and directly return the found concept , the algorithm ends; otherwise, lines 5 to 14 of the algorithm further calculate all candidate direct parent concepts of the new concept according to the direct parent concepts of the standard generator; lines 15 to 16 generate a new concept; lines 17 to 30 Lines exclude unsatisfactory concepts in candidate direct parent concepts, then delete possible old associations, and set the association of new concepts with direct parent concepts; lines 31 to 32 update the parent-child relationship between the new concept and the standard generator And return the new concept, the algorithm ends. Among them, the called function GetCanonicalGenerator is used to obtain the corresponding standard generator of Y.
优选地,在该步骤S1中,通过对概念的内涵进行哈希值计算作为索引,将所有概念保存在哈希表中;在确定标准生成器的过程中,结合所计算的属性集Y的哈希值自底向上搜索标准生成器;藉此,以有效提高标准生成器的查询速度,避免冗余计算。作为一种变劣的实现:也可以通过自底向上的方式将属性集Y与概念的内涵逐一比较,直到找到某一概念的所有父概念都不是属性Y对应概念的生成器,则该概念为标准生成器。Preferably, in this step S1, all concepts are stored in the hash table by calculating the hash value of the connotation of the concept as an index; in the process of determining the standard generator, combined with the calculated hash value of the attribute set Y The standard generator is searched from the bottom up by the value of the value; thereby, the query speed of the standard generator can be effectively improved, and redundant calculation is avoided. As a deteriorating implementation: the attribute set Y can also be compared with the connotation of the concept one by one in a bottom-up manner, until all parent concepts of a concept are found that are not generators of the concept corresponding to the attribute Y, then the concept is Standard generator.
步骤S2、在形式概念格中建立叶子概念的索引,叶子概念是指形式概念格中最小概念的直接父概念。In step S2, an index of a leaf concept is established in the formal concept lattice, and a leaf concept refers to the direct parent concept of the smallest concept in the formal concept lattice.
步骤S3、在查询过程中,概念的内涵对应查询语句,概念的外延对应结果集;在获取用户查询的分面值集后,在所构造的形式概念格中利用叶子概念自底向上匹配包含属性集的最小内涵以找到与分面值集对应的目标概念,并返回该目标概念所对应的外延。Step S3: In the query process, the connotation of the concept corresponds to the query statement, and the extension of the concept corresponds to the result set; after obtaining the facet value set of the user query, the leaf concept is used in the constructed formal concept lattice to match the inclusive attribute set from bottom to top. to find the target concept corresponding to the set of facets, and return the extension corresponding to the target concept.
在该步骤中,分面搜索通过分面的方式,对事物进行归纳检索,常用于专业领域的垂直搜索引擎当中。分面搜索的操作规范没有规定,典型的分面搜索操作分为两个部分,关键词搜索和交互式细化搜索。关键词搜索是传统的信息检索技术,分面搜索技术在此基础之上让用户可以进行交互式的探索,对现有的关键词搜索结果进行进一步的细化分解,提高了搜索结果的准确率,剔除了大多数无关结果集,从而可以获得良好的用户体验。In this step, faceted search conducts inductive retrieval of things in a faceted manner, which is often used in vertical search engines in professional fields. The operation specification of faceted search is not specified, and a typical faceted search operation is divided into two parts, keyword search and interactive refinement search. Keyword search is a traditional information retrieval technology. On this basis, faceted search technology allows users to conduct interactive exploration, further refine and decompose existing keyword search results, and improve the accuracy of search results. , which eliminates most irrelevant result sets, so that a good user experience can be obtained.
分面搜索具体而言有以下几个步骤:The faceted search has the following steps:
(1)在通过某一次关键词搜索后得到关于关键词的结果集,并计算出结果集的分面信息,同时展示出来。(1) After a certain keyword search, a result set about the keyword is obtained, and the facet information of the result set is calculated and displayed at the same time.
(2)用户根据展示的结果集及分面选择感兴趣的分面值进一步细化结果。(2) The user selects the face value of interest according to the displayed result set and facet to further refine the result.
(3)重新搜索相关结果集。(3) Re-search the relevant result set.
这样可以交互式循环查询直到用户获得满意的结果或者结果集为空为止。In this way, the query can be looped interactively until the user obtains a satisfactory result or the result set is empty.
进一步地,本实施例还包括:Further, this embodiment also includes:
在找到目标概念后,根据形式概念格的上下文计算与所述目标概念对应的分面值支持数,该分面值支持数是指分面值对应的对象的个数。比如:针对NBA球员可以有如下分面:姓名、年龄、身高、体重、球员类型及所属球队,每一个分面都包含许多属性,如分面——“球员类型”包括中锋、大前锋、小前锋等等,假设NBA联盟中有中锋球员150名,那么可以用(‘球员类型’,‘中锋’,150)表示。After the target concept is found, the facet value support number corresponding to the target concept is calculated according to the context of the formal concept lattice, where the facet value support number refers to the number of objects corresponding to the facet value. For example: NBA players can have the following facets: name, age, height, weight, player type and team, each facet contains many attributes, such as facets - "player type" includes center, power forward, Small forward, etc., assuming that there are 150 center players in the NBA, it can be represented by ('player type', 'center', 150).
进一步地,本实施例还包括:Further, this embodiment also includes:
记录用户的查询记录,返回多个历史查询结果的最大公共父概念和最小公共子概念作为查询推荐;或者根据概念相似度进行查询推荐。其中:获取最大公共父概念的分两个步骤:Record the user's query records, and return the largest common parent concept and the smallest common child concept of multiple historical query results as query recommendation; or perform query recommendation based on concept similarity. Among them: There are two steps to obtain the maximum common parent concept:
(1)、求所有历史查询对应概念的内涵的交集;(1), find the intersection of the connotations of the corresponding concepts of all historical queries;
(2)、使用(1)的结果进行查询,匹配的概念即为最大公共父概念。(2) Use the result of (1) to query, and the matched concept is the largest common parent concept.
同理,获取最大公共子概念的方法包括:Similarly, the methods to obtain the largest common subconcept include:
(1)、求所有历史查询对应概念的内涵的并集;(1), seek the union of the connotations of the corresponding concepts of all historical queries;
(2)、使用(1)的结果进行查询,匹配的概念即为最小公共子概念。(2) Use the result of (1) to query, and the matched concept is the least common sub-concept.
可选的,本实施例比较概念相似性的定义如下。Optionally, in this embodiment, the definition of conceptual similarity is as follows.
定义4.1:对于形式概念格中任意两个概念A,B,使用extent表示外延,则Jaccard系数为:Definition 4.1: For the formal concept lattice In any two concepts A, B, use extent to represent extension, then the Jaccard coefficient is:
设定当前概念为C,Jaccard相似度为k,泛化操作为当前概念搜索相似度不小于k的最大父概念,细化操作为当前概念搜索相似度不小于k的最小子概念。对于泛化操作,算法从C出发以广度优先遍历的方式访问其父概念,直到某一父概念的所有父概念与C的相似度均小于k,该概念为泛化目标概念;对于细化操作,算法从C出发以广度优先遍历的方式访问其子概念,直到某一子概念的所有子概念与C的相似度均小于k,该概念为细化目标概念。The current concept is set as C, the Jaccard similarity is k, the generalization operation is to search for the largest parent concept with a similarity of not less than k, and the refinement operation is to search for the smallest sub-concept of the current concept with a similarity of not less than k. For the generalization operation, the algorithm starts from C to access its parent concepts in a breadth-first traversal manner, until all the parent concepts of a parent concept have a similarity with C less than k, the concept is the generalization target concept; for the refinement operation , the algorithm starts from C and accesses its sub-concepts in a breadth-first traversal manner, until all sub-concepts of a sub-concept have a similarity with C less than k, which is the refinement target concept.
综上,本实施例公开的基于形式概念格的分面搜索方法,具有以下有益效果:To sum up, the faceted search method based on the formal concept lattice disclosed in this embodiment has the following beneficial effects:
一方面,在构造形式概念格的过程中,动态更新最小概念,而避免提前载入完整的形式背景,提高了算法的灵活性。On the one hand, in the process of constructing the formal concept lattice, the minimum concept is dynamically updated, and the complete formal background is avoided to be loaded in advance, which improves the flexibility of the algorithm.
又一方面,每添加一个对象都会在原有概念格的基础上基于标准生成器进行更新操作以得到新的概念格,实现简单、快捷、可靠,更进一步地,还可以通过对概念的内涵进行哈希值计算作为索引,将所有概念保存在哈希表中,以供在确定标准生成器的过程中,结合所计算的属性集Y的哈希值自底向上搜索标准生成器以有效提高标准生成器的查询速度,避免冗余计算。On the other hand, each time an object is added, an update operation will be performed based on the standard generator on the basis of the original concept lattice to obtain a new concept lattice, which is simple, fast and reliable. The hash value calculation is used as an index, and all concepts are stored in the hash table, so that in the process of determining the standard generator, the standard generator is searched bottom-up in combination with the hash value of the calculated attribute set Y to effectively improve the standard generation. The query speed of the server is avoided, and redundant computation is avoided.
再一方面,本发明还将查询条件解析成属性集合,利用叶子概念自底向上匹配包含属性集的最小内涵,可以快速排除不可能的线路,找到目标概念。On the other hand, the present invention also parses the query conditions into attribute sets, and uses the leaf concept to match the minimum connotation including the attribute set from bottom to top, so that impossible routes can be quickly eliminated and the target concept can be found.
此外,本发明于形式概念格的分面搜索技术能够实现在用户进行检索之前预先计算结果集的上下文关系及分面信息,或者在检索的同时快速计算出结果对应的分面信息,缩短了系统响应时间,提升了用户体验同时降低了用户浏览成本。In addition, the faceted search technology of the present invention based on the formal concept lattice can realize the pre-calculation of the context relationship and facet information of the result set before the user searches, or to quickly calculate the facet information corresponding to the result at the same time of retrieval, which shortens the system time. The response time improves the user experience and reduces the user browsing cost.
实施例2Example 2
与上述方法实施例相对应的,本实施例公开一种基于形式概念格的分面搜索系统,至少包括下述子系统一及子系统二,优选地,还可以进一步包括后续的子系统三和/或子系统四。Corresponding to the above method embodiments, the present embodiment discloses a faceted search system based on a formal concept lattice, which at least includes the following
子系统一,用于构造形式概念格,所述形式概念格使用伪造属性fake_attribute初始化最小概念Bottom,并在每次更新概念格结构时,将输入对象Obj的属性集添加到Bottom的内涵中,最后再将fake_attribute剔除得到完整且正确的形式概念格;同时,在形式概念格中增加新对象的过程中,在输入属性集Y后,首先获取Y所唯一对应的标准生成器,如果概念格已有概念的内涵与Y相等,则Y不产生新概念,并将该新对象添加到其相等内涵所关联概念的外延中,以及将该新对象添加到此概念所有父概念的外延中;如果概念格不存在已有概念的内涵与Y相等,则创建新概念,并根据标准生成器的直接父概念计算新概念的所有候选直接父概念并筛选出真正的直接父概念,然后更新概念之间的父子关系,同时将该新对象添加到新概念所有父概念的外延中。
其中,上述子系统一还用于在所述形式概念格中建立叶子概念的索引,所述叶子概念是指形式概念格中最小概念的直接父概念。进一步地,本实施例的子系统一还用于:通过对概念的内涵进行哈希值计算作为索引,将所有概念保存在哈希表中;在确定标准生成器的过程中,结合所计算的属性集Y的哈希值自底向上搜索标准生成器。Wherein, the above-mentioned
子系统二,用于在查询过程中,概念的内涵对应查询语句,概念的外延对应结果集;在获取用户查询的分面值集后,在所构造的形式概念格中利用叶子概念自底向上匹配包含属性集的最小内涵以找到与所述分面值集对应的目标概念,并返回该目标概念所对应的外延。
子系统三,用于在找到目标概念后,根据形式概念格的上下文计算与所述目标概念对应的分面值支持数,所述分面值支持数是指分面值对应的对象的个数。The third subsystem is used to calculate the facet value support number corresponding to the target concept according to the context of the formal concept lattice after finding the target concept, where the facet value support number refers to the number of objects corresponding to the facet value.
子系统四,用于记录用户的查询记录,返回多个历史查询结果的最大公共父概念和最小公共子概念作为查询推荐;或者根据概念相似度进行查询推荐。The fourth subsystem is used to record the user's query records, and return the largest common parent concept and the smallest common sub-concept of multiple historical query results as query recommendation; or perform query recommendation according to concept similarity.
本系统中,形式概念格采用静态构造模式、动态构造模式或基于构造时间预测的混合式构造模式;所述混合式构造模式根据分面值集预测概念构造耗时,如果小于预设的时间阈值,则采用动态模式构造形式概念格,若大于预设的时间阈值,则采用静态模式构造形式概念格。In this system, the formal concept lattice adopts a static construction mode, a dynamic construction mode or a hybrid construction mode based on construction time prediction; the hybrid construction mode predicts the concept construction time according to the facet value set, if it is less than a preset time threshold, The dynamic mode is used to construct the formal concept lattice, and if it is greater than the preset time threshold, the static mode is used to construct the formal concept lattice.
同理,本实施例公开的基于形式概念格的分面搜索系统,具有以下有益效果:Similarly, the faceted search system based on the formal concept lattice disclosed in this embodiment has the following beneficial effects:
一方面,在构造形式概念格的过程中,动态更新最小概念,而避免提前载入完整的形式背景,提高了算法的灵活性。On the one hand, in the process of constructing the formal concept lattice, the minimum concept is dynamically updated, and the complete formal background is avoided to be loaded in advance, which improves the flexibility of the algorithm.
又一方面,每添加一个对象都会在原有概念格的基础上基于标准生成器进行更新操作以得到新的概念格,实现简单、快捷、可靠,更进一步地,还可以通过对概念的内涵进行哈希值计算作为索引,将所有概念保存在哈希表中,以供在确定标准生成器的过程中,结合所计算的属性集Y的哈希值自底向上搜索标准生成器以有效提高标准生成器的查询速度,避免冗余计算。On the other hand, each time an object is added, an update operation will be performed based on the standard generator on the basis of the original concept lattice to obtain a new concept lattice, which is simple, fast and reliable. The hash value calculation is used as an index, and all concepts are stored in the hash table, so that in the process of determining the standard generator, the standard generator is searched bottom-up in combination with the hash value of the calculated attribute set Y to effectively improve the standard generation. The query speed of the server is avoided, and redundant computation is avoided.
再一方面,本发明还将查询条件解析成属性集合,利用叶子概念自底向上匹配包含属性集的最小内涵,可以快速排除不可能的线路,找到目标概念。On the other hand, the present invention also parses the query conditions into attribute sets, and uses the leaf concept to match the minimum connotation including the attribute set from bottom to top, so that impossible routes can be quickly eliminated and the target concept can be found.
此外,本发明于形式概念格的分面搜索技术能够实现在用户进行检索之前预先计算结果集的上下文关系及分面信息,或者在检索的同时快速计算出结果对应的分面信息,缩短了系统响应时间,提升了用户体验同时降低了用户浏览成本。In addition, the faceted search technology of the present invention based on the formal concept lattice can realize the pre-calculation of the context relationship and facet information of the result set before the user searches, or to quickly calculate the facet information corresponding to the result at the same time of retrieval, which shortens the system time. The response time improves the user experience and reduces the user browsing cost.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710483747.0A CN107391584B (en) | 2017-06-22 | 2017-06-22 | Faceted Search Method and System Based on Formal Concept Lattice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710483747.0A CN107391584B (en) | 2017-06-22 | 2017-06-22 | Faceted Search Method and System Based on Formal Concept Lattice |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107391584A CN107391584A (en) | 2017-11-24 |
CN107391584B true CN107391584B (en) | 2020-12-11 |
Family
ID=60332700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710483747.0A Active CN107391584B (en) | 2017-06-22 | 2017-06-22 | Faceted Search Method and System Based on Formal Concept Lattice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107391584B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094456B (en) * | 2021-04-09 | 2022-09-13 | 郑州大学 | A method for generating a walking path of a robot |
CN114048212B (en) * | 2021-11-16 | 2025-06-24 | 河南大学 | Construction and retrieval method of index architecture for frequent itemsets of spatial big data |
CN114358062B (en) * | 2021-12-23 | 2024-07-12 | 河南大学 | Yellow river bank dam dangerous situation identification method based on formal concept analysis |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101504746A (en) * | 2008-02-04 | 2009-08-12 | 株式会社普罗菲德 | Information processing apparatus, information processing method, and program |
CN101595476A (en) * | 2006-08-31 | 2009-12-02 | 启创互联公司 | The system, the method and computer program that are used for the client definition information architecture |
CN102495844A (en) * | 2011-11-15 | 2012-06-13 | 河海大学 | Improved GuTao method for creating user models |
CN103782250A (en) * | 2011-05-26 | 2014-05-07 | 电气银普股份有限公司 | Modularized control system to enable networked control and sensing of other devices |
CN104036046A (en) * | 2014-07-02 | 2014-09-10 | 重庆大学 | Deep Web query interface pattern matching method based on attribute co-occurrence mode |
JP5725623B2 (en) * | 2012-05-08 | 2015-05-27 | 日本電信電話株式会社 | Program analysis apparatus and method, and program |
CN105474166A (en) * | 2013-03-15 | 2016-04-06 | 先进元素科技公司 | Method and system for purposeful computing |
CN106227805A (en) * | 2016-07-17 | 2016-12-14 | 河南理工大学 | A kind of term definition method and system theoretical based on form concept analysis |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130198599A1 (en) * | 2012-01-30 | 2013-08-01 | Formcept Technologies and Solutions Pvt Ltd | System and method for analyzing a resume and displaying a summary of the resume |
IN2013CH05503A (en) * | 2013-11-29 | 2015-06-12 | Kalyanaraman Raghava |
-
2017
- 2017-06-22 CN CN201710483747.0A patent/CN107391584B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101595476A (en) * | 2006-08-31 | 2009-12-02 | 启创互联公司 | The system, the method and computer program that are used for the client definition information architecture |
CN101504746A (en) * | 2008-02-04 | 2009-08-12 | 株式会社普罗菲德 | Information processing apparatus, information processing method, and program |
CN103782250A (en) * | 2011-05-26 | 2014-05-07 | 电气银普股份有限公司 | Modularized control system to enable networked control and sensing of other devices |
CN102495844A (en) * | 2011-11-15 | 2012-06-13 | 河海大学 | Improved GuTao method for creating user models |
JP5725623B2 (en) * | 2012-05-08 | 2015-05-27 | 日本電信電話株式会社 | Program analysis apparatus and method, and program |
CN105474166A (en) * | 2013-03-15 | 2016-04-06 | 先进元素科技公司 | Method and system for purposeful computing |
CN104036046A (en) * | 2014-07-02 | 2014-09-10 | 重庆大学 | Deep Web query interface pattern matching method based on attribute co-occurrence mode |
CN106227805A (en) * | 2016-07-17 | 2016-12-14 | 河南理工大学 | A kind of term definition method and system theoretical based on form concept analysis |
Non-Patent Citations (2)
Title |
---|
Ligeng Zou等.An efficient algorithm for increasing the granularity levels of attributes in formal concept analysis.《Expert Systems With Applications》.2016,(第46期), * |
近似概念格及其增量构造算法研究;林春杰等;《计算机应用研究》;20120131;第29卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107391584A (en) | 2017-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106484875B (en) | MOLAP-based data processing method and device | |
US7827181B2 (en) | Click distance determination | |
KR101557294B1 (en) | Search results ranking using editing distance and document information | |
KR101793222B1 (en) | Updating a search index used to facilitate application searches | |
CN105027115B (en) | Query and index of documents | |
US6801904B2 (en) | System for keyword based searching over relational databases | |
US7392250B1 (en) | Discovering interestingness in faceted search | |
CN105045875B (en) | Personalized search and device | |
CN117973540A (en) | Retrieval enhancement generation system and method based on knowledge graph | |
Popescul et al. | Structural logistic regression for link analysis | |
Hu et al. | Top-k spatio-textual similarity join | |
CN103339624A (en) | High efficiency prefix search algorithm supporting interactive, fuzzy search on geographical structured data | |
CN107391584B (en) | Faceted Search Method and System Based on Formal Concept Lattice | |
CN106156271A (en) | Related information directory system based on distributed storage and foundation thereof and using method | |
CN103942198A (en) | Method and device for mining intentions | |
Khemmarat et al. | Fast top-k path-based relevance query on massive graphs | |
Martins et al. | Supporting schema references in keyword queries over relational databases | |
JP2011170461A (en) | Information accumulation retrieval method and information accumulation retrieval program | |
CN104794237A (en) | Web page information processing method and device | |
Lee et al. | Graph threshold algorithm | |
Echbarthi et al. | LaSaS: an Aggregated Search based Graph Matching Approach. | |
CN110569327A (en) | A Multi-Keyword Ciphertext Retrieval Method Supporting Dynamic Update | |
JP2000035965A (en) | Similar feature retrieval method and apparatus, and storage medium storing similar feature retrieval program | |
CN117171161A (en) | Data query method and device | |
CN113254718B (en) | Query relaxation method for semantic association search on graph data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |