[go: up one dir, main page]

WO2020068009A1 - A search engine and data warehouse system with vertical and thematic focus - Google Patents

A search engine and data warehouse system with vertical and thematic focus Download PDF

Info

Publication number
WO2020068009A1
WO2020068009A1 PCT/TR2018/050528 TR2018050528W WO2020068009A1 WO 2020068009 A1 WO2020068009 A1 WO 2020068009A1 TR 2018050528 W TR2018050528 W TR 2018050528W WO 2020068009 A1 WO2020068009 A1 WO 2020068009A1
Authority
WO
WIPO (PCT)
Prior art keywords
search engine
vertical
data warehouse
warehouse system
thematic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/TR2018/050528
Other languages
French (fr)
Inventor
Mehmet Ali ERDAL
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Metaform Bilisim Iletisim Danismanlik San Tic Ltd Sti
Original Assignee
Metaform Bilisim Iletisim Danismanlik San Tic Ltd Sti
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metaform Bilisim Iletisim Danismanlik San Tic Ltd Sti filed Critical Metaform Bilisim Iletisim Danismanlik San Tic Ltd Sti
Priority to PCT/TR2018/050528 priority Critical patent/WO2020068009A1/en
Publication of WO2020068009A1 publication Critical patent/WO2020068009A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Definitions

  • the present invention relates to a search engine and data warehouse system with vertical and thematic focus whereby users can control, customize, configure evaluation parameters and algorithms of source data in internet environment depending on users’ needs and store them in special data warehouse and enable to browse the stored source websites and links visually in a relational and depth-wise way.
  • Search engines being used today such as Google, Yandex and Bing are used as services for general purposes and users cannot control, customize the parameters and the algorithms which evaluate the source data or they cannot store the data which are gathered in line with their needs, in any database.
  • search engines a small number of follower search engine service have started to provide service in the world under the leadership of Yahoo and upon Google started to make an overwhelming impression in the field of search engine.
  • search engines are created by companies aiming to dedicate themselves to users who search income models by seeking for searches of personal and corporate users from every country all over the world in all age groups and profile as public services.
  • the Chinese patent document no. CN107273499A discloses a data capture method based on vertical search engine.
  • the method which is subject to the invention disclosed in the Chinese patent document, determines an association degree of each webpage through crawling and analysing the webpage. At the same time, it cannot store the webpage and a website associated with each other according to an association degree threshold. Collection efficiency and storage efficiency are improved by achieving a multi-thread webpage crawling in the invention mentioned in the Chinese patent document.
  • An objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby corporate and/or professional users can control, customize, configure the parameters and algorithms that evaluate the source data depending on their needs and store them in database.
  • Another objective of the present invention is to realize a system which stores the data -that are captured by the search engine system with a vertical and thematic focus wherein searching, sorting, archiving and browsing transactions of web pages are carried out- by means of a data warehouse service that is built-in in internet environment or customer data center.
  • Another objective of the present invention is to realize a system whereby users can switch from a search engine data warehouse with a vertical or thematic focus wherein they are authorized to another one.
  • Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus wherein vertical/thematic/custom field and scope thereof required for corporate structures are identified.
  • Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby vertical field can be identified; which can operate as focused and continuously listens the websites determined by the users within the scope of the identified theme while it is operating in both vertical field and as focused.
  • Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby the pages comprising the data within the scope of the identified theme and the domains wherein the pages are published, are archived categorically and automatically by means of new/old version information.
  • Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby users who will make use of outputs of each thematic search engine, by their authorizations.
  • Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby the pages wherein the pages where the data archived within the scope of the identified theme or themes are located and the websites where the pages are published, are examined in depth-wise visual graphic interface.
  • Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby subject of interest (SOI) identification that will identify each thematic field can be realized.
  • SOI subject of interest
  • Figure 1 is a schematic view of the inventive search engine and source data system with vertical and thematic focus.
  • the inventive search engine and data warehouse system with vertical and thematic focus (1) performing thematic search comprises: - at least one identification module (2) wherein the theme is identified vertically and as focused, configurations are realized and which provides reports and statistics of the identified themes, displays the data in a relational structure, operates as web based;
  • At least one indexing module (3) which examines web pages depending on the theme determined in the identification module (2), indexes the examined web pages, inserts URL’s of the web pages into a queue by using a sorting algorithm and which is in communication with an index database (4) wherein the indexed URL’s (Uniform Resource Locator) are stored; and
  • At least one calculation module (5) which calculates the relation level depending on the theme configurations on the data of the URL in the queue, records the data in the database (6) and carries out tagging transactions.
  • the identification module (2) is configured in order to recalculate the relation level according to the user feedback. Besides, the identification module (2) can carry out theme-specific user identification transaction.
  • Web pages aimed to be listened and indexed depending on users’ requests are entered on the identification module (2).
  • the websites and the pages having only the data requested and needed by the users are automatically circulated for the users continually and the determined websites are listened continuously.
  • the users individually search, examine and decide that the websites -which have the data desired by them among millions of websites to take much time by qualified users- are qualified and then they completely get rid of workloads of copying them to somewhere.
  • Identification transaction is carried out by authorizations of the users who will make use of outputs of each thematic search engine on the identification module (2).
  • the identification module (2) also presents the pages where the data archived within the scope of the theme or the themes whereby the institution or the institutions are identified are located and the websites that publish the pages, on a depth-wise visual graphic interface.
  • a SOI (subject of interest) identification that will identify each thematic field is also made on the identification module (2). For example, such as “competitors, investments, products, executives, reports, suppliers, sales or partners” for a thematic field with TESLA focus.
  • the indexing module (3) is configured in order to be developed by using java and open-source technologies.
  • the indexing module (3) splits the URL at first and scales it before enqueuing the related page.
  • a preferred embodiment uses an algorithm which is specifically developed for this transaction.
  • the indexing module (3) is configured in order to separate it into word roots after splitting the URL.
  • the indexing module (3) carries out the transaction of separating into word root by using NLP (natural language processing) frameworks.
  • the indexing module (3) is configured in order to ensure that scales can be determined according to fields and the users can determine the related scales by themselves.
  • the indexing module (3) carries out the prioritization transactions such that URL’s with the highest value will be processed at first in the queue.
  • the indexing module (3) performs the prioritization by arrangement of the priority queue algorithm such that the URL’s with the highest value will be processed to the queue at first.
  • the indexing module (3) is configured such that it will preferably use java open-source html processor for accesses to the webpage.
  • the indexing module (3) uses pre-ranking algorithm in order that ranking is made according to order of importance.
  • the indexing module (3) is configured in order to take the obligation into consideration that the focus of interest must be in the examined page as well while it is examining the URL’s in cases where it is desired to operate as focused in accordance with the identification.
  • the indexing module (3) calculates theme- relation levels of the data iteratively by using the related feedback algorithm in cases where it is desired to operate without focus in accordance with the identification.
  • the indexing module (3) archives the pages that comprises the data within the scope of the theme defined in the identification module (2) and the domains where these pages are published, in the index database (4) together with the new old version information categorically and automatically.
  • the calculation module (5) is configured in order to be developed using java and open sources and to calculate the relation level by using an algorithm on the data of the URL in the queue by listening the queue.
  • the calculation module (5) considers the theme arrangement when it calculates the relation level by algorithm.
  • the calculation module (5) separates the data in the webpage body into the word roots by using open-source natural language processing structures and calculates the relation level of the page.
  • the calculation module (5) uses the word-tag duo added for the related theme while it carries out the tagging transaction inside the body of the webpage.
  • the calculation module (5) examines the word intensities and frequencies of the user that are used in the webpages enqueued by the indexing module (3) upon being captured inside the theme that is identified over the identification module (2), preferably by using artificial intelligence techniques.
  • the calculation module (5) uses artificial intelligence techniques such as clustering and regression in a preferred embodiment for the examination transaction.
  • the indexing module (3) and the database (6) exist together in a data warehouse structure.
  • the relation level is continuously calculated depending on the data input from the identification module (2) on the collected data and they can be updated so as to be processed in the queue order. Users can easily switch from the search engine data warehouse with vertical or thematic focus wherein they are authorized into another one by means of the data warehouse structure.
  • the inventive search engine and data warehouse system with vertical and thematic focus (1) can be identified by corporate structures according to needs of their units such as research and development, strategy, sales, human resources, purchase units and it collects data continuously by doing research in areas both with vertical and thematic focus and also custom fields. Besides, the inventive system listens the requested websites in accordance with the specified scope continuously and stores copy of the data in a data center prepared for corporate structures.
  • a vertical field can be identified for example as cancer, artificial vision, laser with a search engine and data warehouse system with vertical and thematic focus (1); it can take a company, person or case as a basis and operate as focused in order to collect data in the required fields related to thereof and the websites displayed by the users are listened continuously within the scope of the identified theme while operating in both ways or individually.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a search engine and data warehouse system (1) with vertical and thematic focus whereby users can control, customize, configure evaluation parameters and algorithms of source data in internet environment depending on corporate and/or professional users' needs and store them in special data warehouse and enable to browse the stored source sites and links visually in a relational and depthwise way.

Description

A SEARCH ENGINE AND DATA WAREHOUSE SYSTEM WITH VERTICAL AND THEMATIC FOCUS
Technical Field
The present invention relates to a search engine and data warehouse system with vertical and thematic focus whereby users can control, customize, configure evaluation parameters and algorithms of source data in internet environment depending on users’ needs and store them in special data warehouse and enable to browse the stored source websites and links visually in a relational and depth-wise way. Background of the Invention
Search engines being used today such as Google, Yandex and Bing are used as services for general purposes and users cannot control, customize the parameters and the algorithms which evaluate the source data or they cannot store the data which are gathered in line with their needs, in any database. Considering the history of search engines, a small number of follower search engine service have started to provide service in the world under the leadership of Yahoo and upon Google started to make an overwhelming impression in the field of search engine. Thereby, search engines are created by companies aiming to dedicate themselves to users who search income models by seeking for searches of personal and corporate users from every country all over the world in all age groups and profile as public services.
Although horizontal-type search engines, which have to find answer for searches in all fields and at every level for everyone, are in struggle for finding the one being searched by the user among a high number of data obtained; their success is unfortunately not high in relational or neural searches. The interfaces whereby the results searched in the said search engines are in the form of list; users perform search transactions manually in order to find the websites comprising the information that they need still by clicking the links of the websites in the related lists one by one. Besides, they have to save the pages -wherein the searched information are included- to an area before forgetting them and continue clicking other links by returning the list from their last location and remember the clicked ones. In addition, a large number of unrelated links listed according to the search criteria makes it impossible for a person to access the searched data by the click speed in unit time.
The Chinese patent document no. CN107273499A discloses a data capture method based on vertical search engine. The method, which is subject to the invention disclosed in the Chinese patent document, determines an association degree of each webpage through crawling and analysing the webpage. At the same time, it cannot store the webpage and a website associated with each other according to an association degree threshold. Collection efficiency and storage efficiency are improved by achieving a multi-thread webpage crawling in the invention mentioned in the Chinese patent document.
Therefore, a structure whereby the above-stated problems are overcome; which can be identified by units of corporate structures such as research and development, strategy, sales, human resources, purchase units according to their own needs; which collects data continuously by doing research in areas both with vertical and thematic focus and also custom fields and will listen the requested websites in accordance with the specified scope and provide copy of the data to a data center prepared for corporate structures is needed.
Summary of the Invention
An objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby corporate and/or professional users can control, customize, configure the parameters and algorithms that evaluate the source data depending on their needs and store them in database. Another objective of the present invention is to realize a system which stores the data -that are captured by the search engine system with a vertical and thematic focus wherein searching, sorting, archiving and browsing transactions of web pages are carried out- by means of a data warehouse service that is built-in in internet environment or customer data center.
Another objective of the present invention is to realize a system whereby users can switch from a search engine data warehouse with a vertical or thematic focus wherein they are authorized to another one.
Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus wherein vertical/thematic/custom field and scope thereof required for corporate structures are identified.
Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby vertical field can be identified; which can operate as focused and continuously listens the websites determined by the users within the scope of the identified theme while it is operating in both vertical field and as focused.
Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby the pages comprising the data within the scope of the identified theme and the domains wherein the pages are published, are archived categorically and automatically by means of new/old version information.
Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby users who will make use of outputs of each thematic search engine, by their authorizations. Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby the pages wherein the pages where the data archived within the scope of the identified theme or themes are located and the websites where the pages are published, are examined in depth-wise visual graphic interface.
Another objective of the present invention is to realize a search engine and data warehouse system with vertical and thematic focus whereby subject of interest (SOI) identification that will identify each thematic field can be realized.
Detailed Description of the Invention
“A Search Engine and Source Data System with Vertical and Thematic Focus” realized to fulfil the objectives of the present invention is shown in the figure attached, in which:
Figure 1 is a schematic view of the inventive search engine and source data system with vertical and thematic focus.
The components illustrated in the figure are individually numbered, where the numbers refer to the following:
1. System
2. Identification module
3. Indexing module
4. Index database
5. Calculation module
6. Database
The inventive search engine and data warehouse system with vertical and thematic focus (1) performing thematic search comprises: - at least one identification module (2) wherein the theme is identified vertically and as focused, configurations are realized and which provides reports and statistics of the identified themes, displays the data in a relational structure, operates as web based;
- at least one indexing module (3) which examines web pages depending on the theme determined in the identification module (2), indexes the examined web pages, inserts URL’s of the web pages into a queue by using a sorting algorithm and which is in communication with an index database (4) wherein the indexed URL’s (Uniform Resource Locator) are stored; and
- at least one calculation module (5) which calculates the relation level depending on the theme configurations on the data of the URL in the queue, records the data in the database (6) and carries out tagging transactions.
In a preferred embodiment of the invention, the identification module (2) is configured in order to recalculate the relation level according to the user feedback. Besides, the identification module (2) can carry out theme-specific user identification transaction.
Web pages aimed to be listened and indexed depending on users’ requests are entered on the identification module (2). Depending on the identifications entered from the identification module (2), the websites and the pages having only the data requested and needed by the users are automatically circulated for the users continually and the determined websites are listened continuously. Thereby, the users individually search, examine and decide that the websites -which have the data desired by them among millions of websites to take much time by qualified users- are qualified and then they completely get rid of workloads of copying them to somewhere.
Identification transaction is carried out by authorizations of the users who will make use of outputs of each thematic search engine on the identification module (2). The identification module (2) also presents the pages where the data archived within the scope of the theme or the themes whereby the institution or the institutions are identified are located and the websites that publish the pages, on a depth-wise visual graphic interface.
A SOI (subject of interest) identification that will identify each thematic field is also made on the identification module (2). For example, such as “competitors, investments, products, executives, reports, suppliers, sales or partners” for a thematic field with TESLA focus.
In a preferred embodiment of the invention, the indexing module (3) is configured in order to be developed by using java and open-source technologies. The indexing module (3) splits the URL at first and scales it before enqueuing the related page. A preferred embodiment uses an algorithm which is specifically developed for this transaction. The indexing module (3) is configured in order to separate it into word roots after splitting the URL. In a preferred embodiment, the indexing module (3) carries out the transaction of separating into word root by using NLP (natural language processing) frameworks.
In the invention, the indexing module (3) is configured in order to ensure that scales can be determined according to fields and the users can determine the related scales by themselves. The indexing module (3) carries out the prioritization transactions such that URL’s with the highest value will be processed at first in the queue. In a preferred embodiment, the indexing module (3) performs the prioritization by arrangement of the priority queue algorithm such that the URL’s with the highest value will be processed to the queue at first. The indexing module (3) is configured such that it will preferably use java open-source html processor for accesses to the webpage. The indexing module (3) uses pre-ranking algorithm in order that ranking is made according to order of importance.
The indexing module (3) is configured in order to take the obligation into consideration that the focus of interest must be in the examined page as well while it is examining the URL’s in cases where it is desired to operate as focused in accordance with the identification. The indexing module (3) calculates theme- relation levels of the data iteratively by using the related feedback algorithm in cases where it is desired to operate without focus in accordance with the identification.
In a preferred embodiment of the invention, the indexing module (3) archives the pages that comprises the data within the scope of the theme defined in the identification module (2) and the domains where these pages are published, in the index database (4) together with the new old version information categorically and automatically.
In a preferred embodiment of the invention, the calculation module (5) is configured in order to be developed using java and open sources and to calculate the relation level by using an algorithm on the data of the URL in the queue by listening the queue. The calculation module (5) considers the theme arrangement when it calculates the relation level by algorithm. The calculation module (5) separates the data in the webpage body into the word roots by using open-source natural language processing structures and calculates the relation level of the page. The calculation module (5) uses the word-tag duo added for the related theme while it carries out the tagging transaction inside the body of the webpage.
In the inventive system (1), the calculation module (5) examines the word intensities and frequencies of the user that are used in the webpages enqueued by the indexing module (3) upon being captured inside the theme that is identified over the identification module (2), preferably by using artificial intelligence techniques. The calculation module (5) uses artificial intelligence techniques such as clustering and regression in a preferred embodiment for the examination transaction.
In a preferred embodiment of the invention, the indexing module (3) and the database (6) exist together in a data warehouse structure. Thus, the relation level is continuously calculated depending on the data input from the identification module (2) on the collected data and they can be updated so as to be processed in the queue order. Users can easily switch from the search engine data warehouse with vertical or thematic focus wherein they are authorized into another one by means of the data warehouse structure.
The inventive search engine and data warehouse system with vertical and thematic focus (1) can be identified by corporate structures according to needs of their units such as research and development, strategy, sales, human resources, purchase units and it collects data continuously by doing research in areas both with vertical and thematic focus and also custom fields. Besides, the inventive system listens the requested websites in accordance with the specified scope continuously and stores copy of the data in a data center prepared for corporate structures.
With the inventive system (1), vertical/thematic/custom field and scope required by public institutions, private sector organizations, universities, institutes, non governmental organizations' units, departments, customers or stakeholders or professional end users can be identified. A vertical field can be identified for example as cancer, artificial vision, laser with a search engine and data warehouse system with vertical and thematic focus (1); it can take a company, person or case as a basis and operate as focused in order to collect data in the required fields related to thereof and the websites displayed by the users are listened continuously within the scope of the identified theme while operating in both ways or individually.
With the inventive search engine and data warehouse system with vertical and thematic focus (1); users can control, customize, configure evaluation parameters and algorithms of source data in internet environment depending on users’ needs and store them in special data warehouse and enable to browse the stored source websites and links visually in a relational and depth-wise way.
It is possible to develop various embodiments of inventive search engine system with vertical and thematic focus (1), the invention cannot be limited to examples disclosed herein and it is essentially according to claims.

Claims

1. A search engine and data warehouse system with vertical and thematic focus
(1) characterized by:
- at least one identification module (2) wherein the theme is identified vertically and as focused, configurations are realized and which provides reports and statistics of the identified themes, displays the data in a relational structure, operates as web based;
- at least one indexing module (3) which examines web pages depending on the theme determined in the identification module (2), indexes the examined web pages, inserts URL’s of the web pages into a queue by using a sorting algorithm and which is in communication with an index database (4) wherein the indexed URL’s (Uniform Resource Locator) are stored; and
- at least one calculation module (5) which calculates the relation level depending on the theme configurations on the data of the URL in the queue, records the data in the database (6) and carries out tagging transactions.
2. A search engine and data warehouse system with vertical and thematic focus (1) according to Claim 1; characterized by the identification module (2) which is configured in order to recalculate the relation level according to the user feedback.
3. A search engine and data warehouse system with vertical and thematic focus (1) according to Claim 1 or 2; characterized by the identification module (2) which carries out theme-specific user identification transaction.
4. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the identification module (2) whereon web pages aimed to be listened and indexed depending on users’ requests are entered.
5. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the identification module (2) whereon identification transaction is carried out by authorizations of the users who will make use of outputs of each thematic search engine.
6. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the identification module (2) which presents the pages where the data archived within the scope of the theme or the themes whereby the institution or the institutions are identified are located and the websites that publish the pages, on a depth-wise visual graphic interface.
7. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the identification module (2) wherein SOI (subject of interest) identification that will identify each thematic field is also made.
8. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the indexing module (3) which splits the URL at first and scales it before enqueuing the related page.
9. A search engine and data warehouse system with vertical and thematic focus (1) according to Claim 8; characterized by the indexing module (3) which is configured in order to separate the URL into word roots after splitting it.
10. A search engine and data warehouse system with vertical and thematic focus (1) according to Claim 8 or 9; characterized by the indexing module (3) which carries out the transaction of separating into word root by using NLP (natural language processing) frameworks.
11. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the indexing module (3) which is configured in order to ensure that scales can be determined according to fields and the users can determine the related scales by themselves.
12. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the indexing module (3) which is configured in order to take the obligation into consideration that the focus of interest must be in the examined page as well while it is examining the URL’s in cases where it is desired to operate as focused in accordance with the identification.
13. A search engine and data warehouse system with vertical and thematic focus (1) according to Claim 12; characterized by the indexing module (3) which calculates theme-relation levels of the data iteratively by using the related feedback algorithm in cases where it is desired to operate without focus in accordance with the identification.
14. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the indexing module (3) which archives the pages that comprises the data within the scope of the theme defined in the identification module (2) and the domains where these pages are published, in the index database (4) together with the new old version information categorically and automatically.
15. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the calculation module (5) which is configured in order to be developed using java and open sources and to calculate the relation level by using an algorithm on the data of the URL in the queue by listening the queue.
16. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the calculation module (5) which separates the data in the webpage body into the word roots by using open-source natural language processing structures and calculates the relation level of the page.
17. A search engine and data warehouse system with vertical and thematic focus (1) according to Claim 16; characterized by the calculation module (5) which uses the word-tag duo added for the related theme while it carries out the tagging transaction inside the body of the webpage.
18. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the calculation module (5) which examines the word intensities and frequencies of the user that are used in the webpages enqueued by the indexing module (3) upon being captured inside the theme that is identified over the identification module (2), preferably by using artificial intelligence techniques.
19. A search engine and data warehouse system with vertical and thematic focus (1) according to Claim 18; characterized by the calculation module (5) which uses artificial intelligence techniques such as clustering and regression in a preferred embodiment for the examination transaction.
20. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the indexing module
(3) uses pre-ranking algorithm in order that ranking is made according to order of importance while indexing the webpages gradually.
21. A search engine and data warehouse system with vertical and thematic focus (1) according to any of the preceding claims; characterized by the indexing module
(3) and the database (6) which co-exist in a data warehouse structure.
PCT/TR2018/050528 2018-09-26 2018-09-26 A search engine and data warehouse system with vertical and thematic focus Ceased WO2020068009A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/TR2018/050528 WO2020068009A1 (en) 2018-09-26 2018-09-26 A search engine and data warehouse system with vertical and thematic focus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/TR2018/050528 WO2020068009A1 (en) 2018-09-26 2018-09-26 A search engine and data warehouse system with vertical and thematic focus

Publications (1)

Publication Number Publication Date
WO2020068009A1 true WO2020068009A1 (en) 2020-04-02

Family

ID=69952424

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/TR2018/050528 Ceased WO2020068009A1 (en) 2018-09-26 2018-09-26 A search engine and data warehouse system with vertical and thematic focus

Country Status (1)

Country Link
WO (1) WO2020068009A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100174719A1 (en) * 2009-01-06 2010-07-08 Jorge Alegre Vilches System, method, and program product for personalization of an open network search engine
CN102298622A (en) * 2011-08-11 2011-12-28 中国科学院自动化研究所 Search method for focused web crawler based on anchor text and system thereof
US20130097158A1 (en) * 2011-10-13 2013-04-18 Nageswara Pobbathi Method and system for customizing a web site
US20180189418A1 (en) * 2015-03-30 2018-07-05 Yandex Europe Ag Method of and system for processing a search query

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100174719A1 (en) * 2009-01-06 2010-07-08 Jorge Alegre Vilches System, method, and program product for personalization of an open network search engine
CN102298622A (en) * 2011-08-11 2011-12-28 中国科学院自动化研究所 Search method for focused web crawler based on anchor text and system thereof
US20130097158A1 (en) * 2011-10-13 2013-04-18 Nageswara Pobbathi Method and system for customizing a web site
US20180189418A1 (en) * 2015-03-30 2018-07-05 Yandex Europe Ag Method of and system for processing a search query

Similar Documents

Publication Publication Date Title
US20090216757A1 (en) System and Method for Performing Frictionless Collaboration for Criteria Search
US8346764B1 (en) Information retrieval systems, methods, and software with content-relevancy enhancements
WO2010138772A2 (en) Merging search results
KR20100084510A (en) Identifying information related to a particular entity from electronic sources
CN103124972A (en) Medical query refinement system
WO2007140364A2 (en) Method for scoring changes to a webpage
US20160078354A1 (en) Managing inferred questions
KR20200042767A (en) System and Method for Extracting Keyword and Generating Abstract
Zhu et al. Improving health records search using multiple query expansion collections
US20210407632A1 (en) System and method for displaying electronic health records
US20140059028A1 (en) International search engine optimization analytics
US11347822B2 (en) Query processing to retrieve credible search results
Cucchiarelli et al. A topic recommender for journalists
Lupu et al. Domain specific search
US20240312607A1 (en) Automated AI-Based Method And System For Dynamically Prioritizing Patients' Waiting Lists
CN118588317B (en) Information recommendation method and system for medical and health data mining
WO2020068009A1 (en) A search engine and data warehouse system with vertical and thematic focus
Cruchet et al. What about trust in the question answering world
Fafalios et al. Exploratory professional search through semantic post-analysis of search results
Raghavan et al. Information retrieval as a domain: visualizations based on two data sets
US11520847B2 (en) Learning interpretable strategies in the presence of existing domain knowledge
Hienert et al. Where do all these search terms come from?–two experiments in domain-specific search
Alejo-Machado et al. Bibliometric study of the scientific research on “Learning to Rank” between 2000 and 2013
Khelghati Deep web content monitoring
Nouira et al. FluSpider as a new vision of digital influenza surveillance system: based on Big Data technologies and Massive Data Mining techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18935364

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18935364

Country of ref document: EP

Kind code of ref document: A1