[go: up one dir, main page]

TWI861875B - News public opinion analysis and collection system and method thereof - Google Patents

News public opinion analysis and collection system and method thereof Download PDF

Info

Publication number
TWI861875B
TWI861875B TW112117805A TW112117805A TWI861875B TW I861875 B TWI861875 B TW I861875B TW 112117805 A TW112117805 A TW 112117805A TW 112117805 A TW112117805 A TW 112117805A TW I861875 B TWI861875 B TW I861875B
Authority
TW
Taiwan
Prior art keywords
public opinion
news
data
server
standardized
Prior art date
Application number
TW112117805A
Other languages
Chinese (zh)
Other versions
TW202445381A (en
Inventor
張淑蕙
周彥廷
周智勇
何柏霖
Original Assignee
合作金庫商業銀行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 合作金庫商業銀行股份有限公司 filed Critical 合作金庫商業銀行股份有限公司
Priority to TW112117805A priority Critical patent/TWI861875B/en
Application granted granted Critical
Publication of TWI861875B publication Critical patent/TWI861875B/en
Publication of TW202445381A publication Critical patent/TW202445381A/en

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A public opinion analysis and collection system includes: an extraction-conversion-loading server coupled to multiple data sources for collecting and standardizing multiple news and public opinion data; a database server coupled to the extraction-transformation-loading server a server for storing the standardized news and public opinion data; and a query device coupled to the database server for obtaining the required news and public opinion data from the standardized news and public opinion data by means of a criterion, wherein The criterion includes the condition option (A) as the denominator, and the setting condition (B) as the numerator, and the standardized news and public opinion data are analyzed by the criterion and the score (B)/(A) is the closest to 1, which is defined as “accurate information”.

Description

輿情分析蒐集系統及其方法Public opinion analysis collection system and method

本發明是有關於一種輿情分析蒐集系統及其方法。The present invention relates to a public opinion analysis and collection system and method thereof.

輿情的分析及管理應用已是企業風險管理重要的一環。然而過往的新聞輿情收集方式是安排專人針對特定企業進行網路搜尋,透過人工收集相關的新聞輿情進行判讀,往往需耗費許多人力,也無法標準化收集資料流程,且不同人搜尋的結果也常不相同,常造成後續判斷有分岐之結果。The analysis and management of public opinion has become an important part of corporate risk management. However, the previous way to collect news and public opinion was to arrange for a dedicated person to conduct an Internet search for a specific company and manually collect relevant news and public opinion for interpretation. This often takes a lot of manpower and cannot standardize the data collection process. In addition, the search results of different people are often different, which often leads to divergent results in subsequent judgments.

因此,開發一種標準化的新聞輿情收集以及查詢系統及方法,實是本領域人士所思量的。Therefore, developing a standardized news and public opinion collection and query system and method is what people in this field are considering.

本案的一實施態樣係提供一種輿情分析蒐集系統,其包括:一提取-轉換-載入伺服器,耦接複數個資料來源,用以收集並標準化複數新聞輿情資料;一資料庫伺服器,耦接該提取-轉換-載入伺服器,用以儲存標準化的該些新聞輿情資料;以及一查詢裝置,耦接該資料庫伺服器,用以藉一判準條件從標準化的該些新聞輿情資料獲取所需的新聞輿情資料,其中該判準條件包含條件選項之數量(A)作為分母,且設定條件成立之數量(B)作為分子,標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果最趨近1或等於1者,即定義為「準確資訊」。An embodiment of the present invention provides a public opinion analysis collection system, which includes: an extraction-conversion-loading server, coupled to a plurality of data sources, for collecting and standardizing a plurality of news public opinion data; a database server, coupled to the extraction-conversion-loading server, for storing the standardized news public opinion data; and a query device, coupled to the database server, for querying the news public opinion data by A criterion condition obtains the required news and public opinion data from the standardized news and public opinion data, wherein the criterion condition includes the number of condition options (A) as the denominator and the number of conditions established (B) as the numerator. The standardized news and public opinion data is analyzed by the criterion condition and the score (B)/(A) is closest to 1 or equal to 1, which is defined as "accurate information".

在一些實施例中,標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果次第者,即定義為「次準資訊」。In some embodiments, the standardized news and public opinion data are analyzed by the criteria to obtain a score (B)/(A) result sequence, which is defined as "substandard information".

在一些實施例中,該些條件選項包含關鍵字詞、資料來源、資訊刊登時間或資訊蒐集時間。In some embodiments, the condition options include keywords, data sources, information publishing time, or information collection time.

在一些實施例中,該些新聞輿情資料其中之一者包含的內容經該判準條件之該設定條件過濾出現次數大於0,該設定條件成立,該設定條件成立之數量即設為1。In some embodiments, if the content contained in one of the news and public opinion data appears more than 0 times after being filtered by the set condition of the criterion condition, the set condition is met, and the number of times the set condition is met is set to 1.

在一些實施例中,該些資料來源包含外部網站、新聞首頁網站、主管機關公告網站或重大訊息公告網站。In some embodiments, the data sources include external websites, news homepage websites, regulatory agency announcement websites, or important information announcement websites.

在一些實施例中,該資料庫伺服器根據該判準條件定期傳送對應的新聞輿情資料給該查詢裝置。In some embodiments, the database server periodically transmits corresponding news and public opinion data to the query device according to the criteria.

在一些實施例中,該提取-轉換-載入伺服器用以根據一追蹤清單,使用一爬蟲程式從該些資料來源收集並標準化該些新聞輿情資料。In some embodiments, the extract-transform-load server is used to collect and standardize the news and public opinion data from the data sources according to a tracking list using a crawler.

在一些實施例中,輿情分析蒐集系統更包括一上傳裝置耦接該提取-轉換-載入伺服器,用以傳送該追蹤清單給該提取-轉換-載入伺服器。In some embodiments, the public opinion analysis and collection system further includes an upload device coupled to the extract-convert-load server for transmitting the tracking list to the extract-convert-load server.

在一些實施例中,該提取-轉換-載入伺服器更根據該追蹤清單,使用該爬蟲程式周期的從該些資料來源收集並標準化該些新聞輿情資料儲存在該資料庫伺服器中。In some embodiments, the extract-transform-load server further uses the crawler program to periodically collect and standardize the news and public opinion data from the data sources based on the tracking list and store them in the database server.

本案的另一實施態樣係提供一種輿情分析蒐集方法,包括:使用一提取-轉換-載入伺服器耦接複數個資料來源,用以收集並標準化複數新聞輿情資料;使用一資料庫伺服器耦接該提取-轉換-載入伺服器,用以儲存標準化的該些新聞輿情資料;以及使用一查詢裝置耦接該資料庫伺服器,用以藉一判準條件從標準化的該些新聞輿情資料獲取所需的新聞輿情資料,其中該判準條件包含條件選項之數量(A)作為分母,且設定條件成立之數量(B)作為分子,標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果最趨近1或等於1者,即定義為「準確資訊」。Another embodiment of the present invention provides a method for collecting public opinion analysis, comprising: using an extraction-conversion-loading server coupled to a plurality of data sources to collect and standardize a plurality of news public opinion data; using a database server coupled to the extraction-conversion-loading server to store the standardized news public opinion data; and using a query device coupled to the database server to query the data. The required news and public opinion data are obtained from the standardized news and public opinion data by using a criterion, wherein the criterion includes the number of conditional options (A) as the denominator and the number of conditions established (B) as the numerator. The standardized news and public opinion data are analyzed by the criterion and the score (B)/(A) is closest to 1 or equal to 1, which is defined as "accurate information".

依據本案之技術內容,輿情分析蒐集系統改進現行以人工蒐集新聞及主管機關新聞稿、重大訊息之方式,並以主題標籤方式呈現資料集,供使用者瀏覽資料、分享資料或指定訂閱。為提供使用者作更精準的資訊,可經判準條件過濾資料庫伺服器的新聞輿情資料,再將資料區分定義為「準確資訊」及「次準資訊」,依需求提供給使用者,節省使用者審閱資料的時間。According to the technical content of this case, the public opinion analysis collection system improves the current method of manually collecting news, press releases from competent authorities, and major information, and presents the data set in the form of subject tags for users to browse data, share data, or specify subscriptions. In order to provide users with more accurate information, the news and public opinion data of the database server can be filtered by criteria, and then the data can be divided into "accurate information" and "less accurate information" and provided to users according to their needs, saving users time in reviewing data.

以下將以圖式及詳細敘述清楚說明本案之精神,任何所屬技術領域中具有通常知識者在瞭解本案之實施例後,當可由本案所教示之技術,加以改變及修飾,其並不脫離本案之精神與範圍。The following will clearly illustrate the spirit of the present invention with diagrams and detailed descriptions. After understanding the embodiments of the present invention, any person with ordinary knowledge in the relevant technical field can make changes and modifications based on the techniques taught by the present invention without departing from the spirit and scope of the present invention.

本文之用語只為描述特定實施例,而無意為本案之限制。單數形式如「一」、「這」、「此」、「本」以及「該」,如本文所用,同樣也包含複數形式。The terms used herein are only for describing specific embodiments and are not intended to be limiting of the present invention. Singular forms such as "a", "this", "here", "this" and "the" as used herein also include plural forms.

關於本文中所使用之『耦接』或『連接』,均可指二或多個元件或裝置相互直接作實體接觸,或是相互間接作實體接觸,亦可指二或多個元件或裝置相互操作或動作。As used herein, “coupled” or “connected” may refer to two or more elements or devices being in direct physical contact with each other, or being in indirect physical contact with each other, or may refer to two or more elements or devices operating or moving with each other.

關於本文中所使用之『包含』、『包括』、『具有』、『含有』等等,均為開放性的用語,即意指包含但不限於。The words "include", "including", "have", "contain", etc. used in this article are open terms, meaning including but not limited to.

關於本文中所使用之『及/或』,係包括所述事物的任一或全部組合。As used herein, "and/or" includes any or all combinations of the items described.

關於本文中所使用之用詞(terms),除有特別註明外,通常具有每個用詞使用在此領域中、在本案之內容中與特殊內容中的平常意義。某些用以描述本案之用詞將於下或在此說明書的別處討論,以提供本領域技術人員在有關本案之描述上額外的引導。The terms used in this document generally have the ordinary meanings of each term used in this field, in the context of this case and in the specific context, unless otherwise specified. Certain terms used to describe this case will be discussed below or elsewhere in this specification to provide additional guidance to those skilled in the art in describing this case.

在傳統的新聞輿情收集方式中,必須安排專人針對特定企業進行網路搜尋,藉由人工收集相關的新聞輿情進行判讀,往往需耗費許多人力,也無法標準化收集資料流程。因此,本案提供一種系統機制處理輿情收集,取代掉人工查找過程,建立標準化的查詢流程。In the traditional way of collecting news and public opinion, a dedicated person must be assigned to conduct an Internet search for a specific enterprise. Manually collecting relevant news and public opinion for interpretation often requires a lot of manpower and cannot standardize the data collection process. Therefore, this case provides a system mechanism to handle public opinion collection, replacing the manual search process and establishing a standardized query process.

請參照第1圖,其繪示根據本案一些實施例之輿情分析蒐集系統的功能流程圖50。此系統包含資料蒐集52的功能,此功能藉由與資料來源56的連接,利用系統批次或透過使用者指定時間搜索網路訊息。資料來源56包含外部網站、新聞首頁網站、主管機關公告網站、重大訊息公告網站及其他指定網站等。資料蒐集52仍需包含對輿情的分析功能54,例如依搜尋條件55對輿情進行分析,或依主題分類57對輿情進行分析。在資料蒐集52功能後,此系統另藉資料判準60之功能從所蒐集的資料獲取「準確資訊」。資訊判準之目的免去使用者自行搜尋及整理散見於各網站之資訊,經資訊判準後的資訊即為使用者需要之「準確資訊」。資料判準60例如可利用「關鍵字核實資訊相關性62」的功能及/或「網站來源判定資料出處64」的功能。此系統的輿情經資料蒐集52及資料判準60後即存入資料庫66中,藉以供使用者「查詢或訂閱68」。Please refer to FIG. 1, which shows a functional flow chart 50 of a public opinion analysis and collection system according to some embodiments of the present invention. This system includes a data collection function 52, which searches for network information by system batches or by user-specified time by connecting to a data source 56. The data source 56 includes external websites, news homepage websites, competent authority announcement websites, major information announcement websites, and other designated websites. The data collection 52 still needs to include a public opinion analysis function 54, such as analyzing public opinion according to search conditions 55, or analyzing public opinion according to topic classification 57. After the data collection function 52, the system also obtains "accurate information" from the collected data by means of a data judgment function 60. The purpose of information criterion is to save users from searching and sorting out information scattered on various websites. Information that has been criterioned is the "accurate information" that users need. Data criterion 60 can, for example, utilize the function of "keyword verification of information relevance 62" and/or the function of "website source determination of data source 64". After data collection 52 and data criterion 60, the public opinion of this system is stored in the database 66 for users to "query or subscribe 68".

請參照第2圖,其繪示根據本案一些實施例之輿情分析蒐集系統的功能方塊圖。輿情分析蒐集系統100包括一提取-轉換-載入(extract-transform-load, ETL)伺服器110(以下稱ETL伺服器110)、一資料庫伺服器120、一查詢裝置130以及一上傳裝置140。在一些實施例中, ETL伺服器110耦接資料庫伺服器120,ETL伺服器110是用以進行資料整合的伺服器,在資料整合的過程中會從不同的資料來源或新聞網200、201…20n收集資料,對其進行標準化,再將其載入到資料庫伺服器120中進行分析儲存。在一些實施例中,ETL伺服器110耦接多個不同的資料來源或新聞網200、201…20n,用以根據使用者150透過上傳裝置140傳送的追蹤清單,從不同的資料來源或新聞網200、201…20n中收集對應的新聞輿情資料儲存在資料庫伺服器120中,此外,ETL伺服器110亦定期的根據新聞網200、201…20n中的最新資料,根據追蹤清單更新儲存在資料庫伺服器120中的新聞輿情資料。藉此,資料庫伺服器120可根據使用者150透過查詢裝置130傳送的查詢清單,從資料庫伺服器120中擷取對應的新聞輿情資料回傳給查詢裝置130進行顯示,並以主題標籤方式呈現資料集,供使用者瀏覽資料、分享資料或指定訂閱。Please refer to FIG. 2, which shows a functional block diagram of a public opinion analysis collection system according to some embodiments of the present invention. The public opinion analysis collection system 100 includes an extract-transform-load (ETL) server 110 (hereinafter referred to as ETL server 110), a database server 120, a query device 130, and an upload device 140. In some embodiments, the ETL server 110 is coupled to the database server 120. The ETL server 110 is a server for data integration. In the process of data integration, data is collected from different data sources or news networks 200, 201...20n, standardized, and then loaded into the database server 120 for analysis and storage. In some embodiments, the ETL server 110 is coupled to a plurality of different data sources or news websites 200, 201...20n, and is used to collect corresponding news and public opinion data from different data sources or news websites 200, 201...20n and store them in the database server 120 according to the tracking list transmitted by the user 150 through the upload device 140. In addition, the ETL server 110 also regularly updates the news and public opinion data stored in the database server 120 according to the latest data in the news websites 200, 201...20n and the tracking list. Thus, the database server 120 can retrieve corresponding news and public opinion data from the database server 120 according to the query list sent by the user 150 through the query device 130 and return it to the query device 130 for display, and present the data set in the form of subject tags for users to browse data, share data or specify subscription.

在一些實施例中,ETL伺服器110可將分佈在多個系統、使用不同語言的資料來源或新聞網200、201…20n中新聞輿情資料轉換為統一的格式和型態,以便更容易對其進行分析。其中,不同的新聞網200、201…20n可為外部網站、新聞首頁網站、主管機關公告網站、重大訊息公告網站及其他指定網站等,外部網站可包含公司財報網站、一聯徵資訊網站、一公開資訊網站以及一新聞網站,藉以提供公司的財報、聯徵資訊、公開資訊以及新聞輿情等。在一些實施例中,ETL伺服器110會根據上傳裝置140傳送的追蹤清單從新聞網200、201…20n提取資料。在一些實施例中,ETL伺服器110是藉由一網路爬蟲程式抓取新聞網200、201…20n中的網站資料。由於來自不同新聞網200、201…20n的資料可能具有不同的結構和特徵,ETL伺服器110會對提取資料進行轉換來標準化這些不同的資料,將此些資料轉變成通用格式,接著進行資料清理,消除資料中的雜訊,並修復缺失值和不一致之處,並將重複資料刪除。接著,載入資料,其中,轉換後的資料會被載入到資料庫伺服器120中進行分析儲存。依此,使用者150即可透過查詢裝置130輸入判準條件,從資料庫伺服器120擷取所需資料。在一些較佳實施例中,ETL伺服器110以及資料庫伺服器120具有處理器單元、通訊單元及儲存單元等運行ETL伺服器110以及資料庫伺服器120的必要構件。In some embodiments, the ETL server 110 can convert the data sources or news and public opinion data in the news websites 200, 201 ... 20n distributed in multiple systems and using different languages into a unified format and type, so as to make it easier to analyze them. Among them, the different news websites 200, 201 ... 20n can be external websites, news homepage websites, competent authority announcement websites, major information announcement websites and other designated websites, etc. The external website can include a company financial report website, a joint recruitment information website, a public information website and a news website, so as to provide the company's financial report, joint recruitment information, public information and news and public opinion, etc. In some embodiments, the ETL server 110 extracts data from the news websites 200, 201 ... 20n according to the tracking list transmitted by the upload device 140. In some embodiments, the ETL server 110 crawls the website data in the news network 200, 201...20n by a web crawler. Since the data from different news networks 200, 201...20n may have different structures and characteristics, the ETL server 110 will transform the extracted data to standardize these different data, convert these data into a common format, and then clean the data to eliminate noise in the data, repair missing values and inconsistencies, and delete duplicate data. Then, the data is loaded, wherein the transformed data will be loaded into the database server 120 for analysis and storage. In this way, the user 150 can input the criteria through the query device 130 to extract the required data from the database server 120. In some preferred embodiments, the ETL server 110 and the database server 120 have necessary components such as a processor unit, a communication unit, and a storage unit for running the ETL server 110 and the database server 120.

在一些實施例中,查詢裝置130經由一網際網路與資料庫伺服器120進行通訊。查詢裝置130可為使用者150的一終端裝置,終端裝置可為個人電腦、筆記型電腦、智慧手機以及平板電腦,但不以上述所述為限。此外,本案亦未限制使用者150僅能具有單一查詢裝置130,該使用者150可同時擁有複數查詢裝置130。在一些實施例中,查詢裝置130中預先安裝有一查詢程式131,該查詢程式131是由輿情分析蒐集系統100所發行,用於提供一使用者介面讓使用者150進行資料庫伺服器120的查詢服務以及顯示查詢結果。查詢程式131可由此輿情分析蒐集系統100所運行。藉此,使用者150可透過查詢裝置130查詢程式131的使用者介面輸入欲查詢的一特定客戶或欲查詢的一客戶清單,資料庫伺服器120可據此特定客戶或客戶清單於資料庫伺服器120中進行搜尋,並將搜尋結果顯示在查詢程式131的使用者介面上。在一些實施例中,資料庫伺服器120中具一搜尋程式121,用以根據查詢程式131傳入的特定客戶或客戶清單在資料庫伺服器120中進行搜尋,並將搜尋結果回傳查詢程式131的使用者介面進行顯示。在一些實施例中,因為ETL伺服器110可定期的根據新聞網200、201…20n中的最新資料更新儲存在資料庫伺服器120中的新聞輿情資料。因此,當資料庫伺服器120所儲存對應特定客戶或客戶清單的新聞輿情資料被更新時,搜尋程式121同步傳送最新資料給查詢裝置130顯示在查詢程式131的使用者介面上。藉此確保使用者150所查詢之新聞輿情資料均為最新資料。在一些實施例中,查詢裝置130具有處理單元、通訊單元及儲存單元等運行查詢裝置130的必要構件。In some embodiments, the query device 130 communicates with the database server 120 via an Internet. The query device 130 may be a terminal device of the user 150, and the terminal device may be a personal computer, a laptop, a smart phone, and a tablet computer, but is not limited to the above. In addition, the present case does not limit the user 150 to have only a single query device 130, and the user 150 may have multiple query devices 130 at the same time. In some embodiments, a query program 131 is pre-installed in the query device 130. The query program 131 is issued by the public opinion analysis and collection system 100 and is used to provide a user interface for the user 150 to perform a query service of the database server 120 and display the query results. The query program 131 can be run by the public opinion analysis and collection system 100. Thus, the user 150 can input a specific customer to be queried or a customer list to be queried through the user interface of the query program 131 of the query device 130, and the database server 120 can search the database server 120 according to the specific customer or customer list, and display the search results on the user interface of the query program 131. In some embodiments, the database server 120 has a search program 121 for searching in the database server 120 according to a specific customer or customer list input by the query program 131, and returning the search results to the user interface of the query program 131 for display. In some embodiments, because the ETL server 110 can regularly update the news and public opinion data stored in the database server 120 according to the latest data in the news network 200, 201...20n. Therefore, when the news and public opinion data corresponding to the specific customer or customer list stored in the database server 120 is updated, the search program 121 synchronously transmits the latest data to the query device 130 for display on the user interface of the query program 131. This ensures that the news and public opinion data queried by the user 150 are the latest data. In some embodiments, the query device 130 has necessary components for running the query device 130, such as a processing unit, a communication unit, and a storage unit.

在一些實施例中,上傳裝置140耦接 ETL伺服器110,用以傳送使用者150匯入的追蹤清單給ETL伺服器110。藉此,ETL伺服器110可根據追蹤清單利用一網路爬蟲程式抓取新聞網200、201…20n中有關此追蹤清單的網站資料,例如新聞輿情資料,儲存在資料庫伺服器120中。在一些實施例中,使用者150可將欲進行追蹤客戶彙整為一追蹤清單匯入上傳裝置140,透過上傳裝置140傳輸給伺服器110。在一些實施例中,上傳裝置140為一訊號轉換器,用以將客戶匯入的追蹤清單轉換為ETL伺服器110可讀取的信號。In some embodiments, the upload device 140 is coupled to the ETL server 110 to transmit the tracking list imported by the user 150 to the ETL server 110. Thus, the ETL server 110 can crawl the website data related to the tracking list in the news websites 200, 201...20n using a web crawler according to the tracking list, such as news and public opinion data, and store them in the database server 120. In some embodiments, the user 150 can import the tracking list of the customers to be tracked into the upload device 140, and transmit it to the server 110 through the upload device 140. In some embodiments, the upload device 140 is a signal converter for converting the tracking list imported by the customer into a signal readable by the ETL server 110 .

在一些實施例中,查詢裝置130耦接資料庫伺服器120,用以藉一判準條件從標準化的新聞輿情資料獲取所需的新聞輿情資料。判準條件包含條件選項之數量(A)作為分母,且設定條件成立之數量(B)作為分子,該些新聞輿情資料其中之一者包含的內容經該判準條件之該設定條件過濾出現次數大於0,設定條件成立,設定條件成立之數量即設為1,標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果最趨近1或等於1者,即定義為「準確資訊」,可提供使用者作為「優先揭露訊息」。標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果次第者(即(B)/(A)結果較不趨近1者),即定義為「次準資訊」,使用者得透過系統詢問或設定,選擇是否揭露「次準資訊」,以達成資訊判準及分析功能之目的。在一些實施例中,條件選項可例如是關鍵字詞(多個關鍵字詞視為個別條件,可各別輸入欄位或以符號+、-、&、()為定義)、資料來源、資訊刊登時間(資訊出現於各網站時間,可定義為區間或…之前)或資訊蒐集時間(系統上傳至資料庫時間)。在一些實施例中,使用者知悉近日A銀行將有主管機關裁罰事件,另為免A銀行遲延發布重大訊息,致遭主管機關裁罰,遂於112年2月7日16:54於系統透過查詢裝置130查詢程式131搜尋「A銀行」(條件1)+「裁罰」(條件2),並選擇「資訊來源:金管會網站」(條件3)、「資訊刊登時間:112年2月7日15時(之前)」(條件4)。查詢裝置130自資料庫伺服器120搜尋,分析彙整來自金管會網站資料23筆,符合「銀行」+「裁罰」共資料45筆,符合資訊刊登時間者13筆。同時符合前述設定之條件1~4條件者為2筆(因分數結果(設定條件成立之數量(B)/條件選項之數量(A))=4/4為1)定義為「準確資訊」,主動揭露於使用者,其餘分數結果小於1者為「次準資訊」,爰不主動揭露於使用者。使用者透過上述系統分析知悉金管會對A銀行裁罰資訊,發現A銀行裁罰資訊分別出現在金管會網站之「重大裁罰」網頁及「新聞稿」網頁,使用者快速通報A銀行總公司,並向公司股東即時揭露此重大訊息。使用者心有餘力,欲查詢該筆查詢之「次準資訊」,發現A銀行遭裁罰訊息112年2月7日16時出現於Y網站及G網站,並延伸出其他新聞及網友議論,為避免節外生枝,使用者得將上開資訊提供於總公司輿情控管小組,進而發布聲明稿,降低對A銀行聲譽之影響。在一些實施例中,使用者心繫此事,爰訂閱上開條件,惟將資訊刊登時間輸入為112年2月7日16時之後,並以每日為條件由資料庫伺服器120批次搜尋彙整E-mail至使用者公司信箱或查詢裝置130。使用者藉由每日報表,發現輿論於5日後弭平,並告知主管此事俾憑結案。In some embodiments, the query device 130 is coupled to the database server 120 to obtain the required news and public opinion data from the standardized news and public opinion data by a criterion. The criterion includes the number of condition options (A) as the denominator, and the number of set conditions satisfied (B) as the numerator. If the number of occurrences of the content included in one of the news and public opinion data after filtering by the set condition of the criterion is greater than 0, the set condition is satisfied, and the number of set conditions satisfied is set to 1. The standardized news and public opinion data analyzed by the criterion and the score (B)/(A) result is closest to 1 or equal to 1 is defined as "accurate information" and can be provided to users as "priority disclosure information". The standardized news and public opinion data are analyzed by the criteria and the scores (B)/(A) are in the order (i.e., the (B)/(A) results are less close to 1), which is defined as "sub-quasi information". The user can choose whether to disclose "sub-quasi information" through system inquiry or setting to achieve the purpose of information criterion and analysis function. In some embodiments, the condition options can be, for example, keywords (multiple keywords are regarded as individual conditions, which can be entered into the fields separately or defined by symbols +, -, &, (), data source, information publishing time (the time when the information appears on each website, which can be defined as a period or before...) or information collection time (the time when the system uploads to the database). In some embodiments, the user knows that Bank A will be fined by the competent authority in the near future. In order to prevent Bank A from delaying the release of important information and being fined by the competent authority, the user searches for "Bank A" (condition 1) + "fine" (condition 2) through the query device 130 query program 131 in the system at 16:54 on February 7, 2013, and selects "Information source: Financial Supervisory Commission website" (condition 3) and "Information publication time: February 7, 2013 15:00 (before)" (condition 4). The query device 130 searches from the database server 120, analyzes and aggregates 23 data from the Financial Supervisory Commission website, and a total of 45 data that match "bank" + "fine", and 13 of which match the information publication time. There are 2 items that meet the above-mentioned conditions 1 to 4 at the same time (because the score result (the number of conditions established (B)/the number of condition options (A)) = 4/4 is 1) defined as "accurate information" and actively disclosed to users. The remaining scores less than 1 are "sub-accurate information" and are not actively disclosed to users. Through the above system analysis, users learned about the penalty information of the Financial Supervisory Commission on Bank A, and found that the penalty information of Bank A appeared on the "Major Penalties" page and "Press Release" page of the Financial Supervisory Commission's website. Users quickly reported to the head office of Bank A and disclosed this important information to the company's shareholders immediately. The user is willing to check the "sub-accurate information" of the query, and finds that the information about the penalty of Bank A appeared on the Y website and the G website at 16:00 on February 7, 2013, and other news and netizens' discussions were extended. In order to avoid unnecessary trouble, the user can provide the above information to the public opinion control team of the head office, and then issue a statement to reduce the impact on the reputation of Bank A. In some embodiments, the user is concerned about this matter and subscribes to the above conditions, but enters the information publishing time as after 16:00 on February 7, 2013, and uses the daily condition to batch search and aggregate E-mails from the database server 120 to the user's company mailbox or query device 130. The user found out from the daily report that the public opinion died down after 5 days and informed the supervisor so that the case could be closed.

請參照第3圖,其繪示根據本案一實施例輿情分析蒐集系統的方法流程圖。請同時參閱第2圖與第3圖,輿情分析蒐集方法300,於步驟310中,透過爬蟲程式爬取內容。在一些實施例中,ETL伺服器110耦接多個不同的新聞網200、201…20n,用以根據使用者150透過上傳裝置140傳送的追蹤清單,利用一網路爬蟲程式從不同的資料來源或新聞網200、201…20n中收集對應的新聞輿情資料儲存在資料庫伺服器120中有關此追蹤清單的網站資料,如新聞網頁內容。步驟320中,將轉換後爬取內容存入資料庫中。在一些實施例中,由於來自不同新聞網200、201…20n的資料具有不同的結構和特徵,ETL伺服器110會對爬取內容進行轉換來標準化這些不同的爬取內容,將此些爬取內容轉變成通用格式,接著進行資料清理,消除資料中的雜訊,並修復缺失值和不一致之處,並將重複資料刪除。接著,將整理完成的資料進行儲存,其中,轉換後的資料會被載入到資料庫伺服器120中進行分析儲存。步驟330中,匯入判準條件。在一些實施例中,一查詢裝置130經由一網際網路與資料庫伺服器120進行通訊。其中,查詢裝置130中預先安裝有一查詢程式131,該查詢程式131是由輿情分析蒐集系統100所發行,用於提供一使用者介面讓使用者150進行資料庫伺服器120的查詢服務以及顯示查詢結果。使用者150可透過查詢裝置130查詢程式131的使用者介面輸入判準條件,例如欲查詢的一特定客戶或欲查詢的一客戶清單或上述的條件選項之數量(A)作為分母且設定條件成立之數量(B)作為分子的查詢方式。在一些實施例中,以特定客戶的統一編號或欲查詢的一客戶清單統一編號進行查詢。步驟340中,顯示查詢結果。在一些實施例中,資料庫伺服器120可據此特定客戶或客戶清單於資料庫伺服器120中進行搜尋,並將搜尋結果顯示在查詢程式131的使用者介面上。在一些實施例中,資料庫伺服器120可將搜尋結果現行彙整成使用者要求格式,再將搜尋結果顯示在查詢程式131的使用者介面上。在一些實施例中,搜尋結果顯示符合前述設定之多種條件者之「準確資訊(分數(B)/(A)結果最趨近1或等於1者)」及「次準資訊(分數(B)/(A)結果次第者)」。Please refer to FIG. 3, which shows a method flow chart of a public opinion analysis collection system according to an embodiment of the present invention. Please refer to FIG. 2 and FIG. 3 at the same time. The public opinion analysis collection method 300, in step 310, crawls content through a crawler. In some embodiments, the ETL server 110 is coupled to a plurality of different news websites 200, 201 ... 20n, and is used to collect corresponding news public opinion data from different data sources or news websites 200, 201 ... 20n according to the tracking list sent by the user 150 through the upload device 140, and store the website data related to the tracking list in the database server 120, such as news web page content. In step 320, the converted crawled content is stored in the database. In some embodiments, since the data from different news networks 200, 201...20n have different structures and characteristics, the ETL server 110 will convert the crawled content to standardize these different crawled contents, convert these crawled contents into a common format, and then perform data cleaning to eliminate noise in the data, repair missing values and inconsistencies, and delete duplicate data. Then, the sorted data is stored, wherein the converted data will be loaded into the database server 120 for analysis and storage. In step 330, the judgment criteria are imported. In some embodiments, a query device 130 communicates with the database server 120 via an Internet. A query program 131 is pre-installed in the query device 130. The query program 131 is issued by the public opinion analysis and collection system 100 and is used to provide a user interface for the user 150 to perform a query service of the database server 120 and display the query results. The user 150 can enter the criteria through the user interface of the query program 131 of the query device 130, such as a specific customer to be queried or a customer list to be queried or the quantity (A) of the above condition options as the denominator and the quantity (B) of the condition being satisfied as the numerator. In some embodiments, the query is performed using a specific customer's uniform number or a customer list uniform number to be queried. In step 340, the query result is displayed. In some embodiments, the database server 120 can search the database server 120 according to the specific customer or customer list, and display the search result on the user interface of the query program 131. In some embodiments, the database server 120 can aggregate the search result into the user-required format and then display the search result on the user interface of the query program 131. In some embodiments, the search results display "accurate information (results with scores (B)/(A) closest to 1 or equal to 1)" and "less accurate information (results with scores (B)/(A) at the lowest level)" that meet the multiple conditions set above.

依此,本案輿情分析蒐集系統改進現行以人工蒐集新聞及主管機關新聞稿、重大訊息之方式,並以主題標籤方式呈現資料集,供使用者瀏覽資料、分享資料或指定訂閱。此外,為提供使用者作更精準的資訊,可經判準條件過濾資料庫伺服器的新聞輿情資料,再將資料區分定義為「準確資訊」及「次準資訊」,依需求提供給使用者,節省使用者審閱資料的時間。Accordingly, the public opinion analysis and collection system of this case improves the current method of manually collecting news, press releases of competent authorities, and major information, and presents the data set in the form of subject tags for users to browse data, share data, or specify subscriptions. In addition, in order to provide users with more accurate information, the news and public opinion data of the database server can be filtered by criteria, and then the data can be divided into "accurate information" and "less accurate information" and provided to users according to their needs, saving users time in reviewing data.

雖然本案以實施例揭露如上,然其並非用以限定本案,任何熟習此技藝者,在不脫離本案之精神和範圍內,當可作各種之更動與潤飾,因此本案之保護範圍當視後附之申請專利範圍所界定者為準。Although the present invention is disclosed as above by way of embodiments, it is not intended to limit the present invention. Anyone skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be subject to the scope of the patent application attached hereto.

50:功能流程圖 52:資料蒐集 54:分析功能 55:依搜尋條件 56:資料來源 57:依主題分類 60:資料判準 62:關鍵字核實資訊相關性 64:網站來源判定資料出處 66:資料庫 68:查詢或訂閱 100:輿情分析蒐集系統 110:提取-轉換-載入(ETL)伺服器 120:資料庫伺服器 121:搜尋程式 130:查詢裝置 131:查詢程式 140:上傳裝置 150:使用者 200、201…20n:新聞網 300:輿情分析蒐集方法 310:步驟 320:步驟 330:步驟 340:步驟 50: Functional flow chart 52: Data collection 54: Analysis function 55: Based on search conditions 56: Data source 57: Classification by topic 60: Data judgment 62: Keyword verification of information relevance 64: Website source to determine the source of data 66: Database 68: Query or subscription 100: Public opinion analysis collection system 110: Extract-Transform-Load (ETL) server 120: Database server 121: Search program 130: Query device 131: Query program 140: Upload device 150: User 200, 201…20n: News network 300: Public Opinion Analysis Collection Method 310: Steps 320: Steps 330: Steps 340: Steps

此處的附圖被併入說明書中並構成本說明書的一部分,這些附圖示出了符合本發明的實施例,並與說明書一起用於說明本發明實施例的技術方案。 第1圖繪示根據本案一些實施例之輿情分析蒐集系統的功能流程圖。 第2圖繪示根據本案一些實施例之輿情分析蒐集系統的功能方塊圖。 第3圖繪示根據本案一實施例輿情分析蒐集系統的方法流程圖。 The drawings herein are incorporated into the specification and constitute a part of the specification. These drawings illustrate embodiments consistent with the present invention and are used together with the specification to illustrate the technical solutions of the embodiments of the present invention. Figure 1 shows a functional flow chart of a public opinion analysis and collection system according to some embodiments of the present invention. Figure 2 shows a functional block diagram of a public opinion analysis and collection system according to some embodiments of the present invention. Figure 3 shows a method flow chart of a public opinion analysis and collection system according to an embodiment of the present invention.

100:輿情分析蒐集系統 100: Public Opinion Analysis Collection System

110:提取-轉換-載入(ETL)伺服器 110: Extract-Transform-Load (ETL) Server

120:資料庫伺服器 120: Database server

121:搜尋程式 121:Search program

130:查詢裝置 130: Query device

131:查詢程式 131: Query program

140:上傳裝置 140: Upload device

150:使用者 150:User

200、201...20n:新聞網 200, 201...20n: News Network

Claims (9)

一種輿情分析蒐集系統,包括:一提取-轉換-載入伺服器,耦接複數個資料來源,用以收集並標準化複數新聞輿情資料;一資料庫伺服器,耦接該提取-轉換-載入伺服器,用以儲存標準化的該些新聞輿情資料;以及一查詢裝置,耦接該資料庫伺服器,用以藉一判準條件從標準化的該些新聞輿情資料獲取所需的新聞輿情資料,其中該判準條件包含條件選項之數量(A)作為分母,且設定條件成立之數量(B)作為分子,標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果最趨近1或等於1者,即定義為「準確資訊」,其中標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果次第者,即定義為「次準資訊」。 A public opinion analysis and collection system includes: an extraction-conversion-loading server coupled to a plurality of data sources for collecting and standardizing a plurality of news public opinion data; a database server coupled to the extraction-conversion-loading server for storing the standardized news public opinion data; and a query device coupled to the database server for obtaining the required news public opinion data from the standardized news public opinion data by a criterion, wherein The criteria include the number of conditional options (A) as the denominator, and the number of conditions that are satisfied (B) as the numerator. The standardized news and public opinion data analyzed by the criteria and the score (B)/(A) closest to 1 or equal to 1 are defined as "accurate information", and the standardized news and public opinion data analyzed by the criteria and the score (B)/(A) are defined as "inaccurate information". 如請求項1所述之輿情分析蒐集系統,其中該條件選項包含關鍵字詞、資料來源、資訊刊登時間或資訊蒐集時間。 The public opinion analysis collection system as described in claim 1, wherein the condition options include keywords, data sources, information publishing time or information collection time. 如請求項1所述之輿情分析蒐集系統,其中該些新聞輿情資料其中之一者包含的內容經該判準條件之該設定條件過濾出現次數大於0,該設定條件成立,該設定條件成立之數量即設為1。 As described in claim 1, in the public opinion analysis collection system, if the number of times the content contained in one of the news public opinion data is filtered by the set condition of the criterion condition is greater than 0, the set condition is satisfied, and the number of times the set condition is satisfied is set to 1. 如請求項1所述之輿情分析蒐集系統,其中該些資料來源包含外部網站、新聞首頁網站、主管機關公告網站或重大訊息公告網站。 The public opinion analysis collection system as described in Request Item 1, wherein the data sources include external websites, news homepage websites, competent authority announcement websites or major information announcement websites. 如請求項1所述之輿情分析蒐集系統,其中該資料庫伺服器根據該判準條件定期傳送對應的新聞輿情資料給該查詢裝置。 The public opinion analysis and collection system as described in claim 1, wherein the database server periodically transmits corresponding news public opinion data to the query device according to the criteria. 如請求項1所述之輿情分析蒐集系統,其中該提取-轉換-載入伺服器用以根據一追蹤清單,使用一爬蟲程式從該些資料來源收集並標準化該些新聞輿情資料。 The public opinion analysis collection system as described in claim 1, wherein the extract-transform-load server is used to collect and standardize the news public opinion data from the data sources using a crawler according to a tracking list. 如請求項6所述之輿情分析蒐集系統,更包括一上傳裝置耦接該提取-轉換-載入伺服器,用以傳送該追蹤清單給該提取-轉換-載入伺服器。 The public opinion analysis and collection system as described in claim 6 further includes an upload device coupled to the extraction-conversion-loading server for transmitting the tracking list to the extraction-conversion-loading server. 如請求項6所述之輿情分析蒐集系統,其中該提取-轉換-載入伺服器更根據該追蹤清單,使用該爬蟲程式周期的從該些資料來源收集並標準化該些新聞輿情資料儲存在該資料庫伺服器中。 The public opinion analysis and collection system as described in claim 6, wherein the extract-transform-load server further uses the crawler program to periodically collect and standardize the news public opinion data from the data sources based on the tracking list and store them in the database server. 一種輿情分析蒐集方法,包括:由一提取-轉換-載入伺服器耦接複數個資料來源,用以收集並標準化複數新聞輿情資料; 由一資料庫伺服器耦接該提取-轉換-載入伺服器,用以儲存標準化的該些新聞輿情資料;以及由一查詢裝置耦接該資料庫伺服器,用以藉一判準條件從標準化的該些新聞輿情資料獲取所需的新聞輿情資料,其中該判準條件包含條件選項之數量(A)作為分母,且設定條件成立之數量(B)作為分子,標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果最趨近1或等於1者,即定義為「準確資訊」,其中標準化的該些新聞輿情資料經該判準條件分析而獲得分數(B)/(A)結果次第者,即定義為「次準資訊」。 A method for collecting public opinion analysis includes: coupling a plurality of data sources by an extraction-conversion-loading server to collect and standardize a plurality of news public opinion data; coupling a database server to the extraction-conversion-loading server to store the standardized news public opinion data; and coupling a query device to the database server to obtain the required news public opinion data from the standardized news public opinion data by a criterion. The criterion includes the number of conditional options (A) as the denominator, and the number of conditions that are satisfied (B) as the numerator. The standardized news and public opinion data analyzed by the criterion and the score (B)/(A) closest to 1 or equal to 1 is defined as "accurate information", and the standardized news and public opinion data analyzed by the criterion and the score (B)/(A) is the lowest, which is defined as "sub-accurate information".
TW112117805A 2023-05-12 2023-05-12 News public opinion analysis and collection system and method thereof TWI861875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW112117805A TWI861875B (en) 2023-05-12 2023-05-12 News public opinion analysis and collection system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW112117805A TWI861875B (en) 2023-05-12 2023-05-12 News public opinion analysis and collection system and method thereof

Publications (2)

Publication Number Publication Date
TWI861875B true TWI861875B (en) 2024-11-11
TW202445381A TW202445381A (en) 2024-11-16

Family

ID=94377589

Family Applications (1)

Application Number Title Priority Date Filing Date
TW112117805A TWI861875B (en) 2023-05-12 2023-05-12 News public opinion analysis and collection system and method thereof

Country Status (1)

Country Link
TW (1) TWI861875B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWM595829U (en) * 2020-01-03 2020-05-21 華南商業銀行股份有限公司 Financial transaction volume warning system
TWM599938U (en) * 2020-01-21 2020-08-11 兆豐國際商業銀行股份有限公司 News filtering device
CN106874487B (en) * 2017-02-21 2020-08-18 国信优易数据有限公司 Distributed crawler management system and method thereof
TWM617933U (en) * 2021-07-02 2021-10-01 大數軟體有限公司 News and public opinion analysis system
TW202223686A (en) * 2020-12-07 2022-06-16 中華電信股份有限公司 Network public opinion analysis method and server
TWM645374U (en) * 2023-05-12 2023-08-21 合作金庫商業銀行股份有限公司 News public opinion analysis and collection system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874487B (en) * 2017-02-21 2020-08-18 国信优易数据有限公司 Distributed crawler management system and method thereof
TWM595829U (en) * 2020-01-03 2020-05-21 華南商業銀行股份有限公司 Financial transaction volume warning system
TWM599938U (en) * 2020-01-21 2020-08-11 兆豐國際商業銀行股份有限公司 News filtering device
TW202223686A (en) * 2020-12-07 2022-06-16 中華電信股份有限公司 Network public opinion analysis method and server
TWM617933U (en) * 2021-07-02 2021-10-01 大數軟體有限公司 News and public opinion analysis system
TWM645374U (en) * 2023-05-12 2023-08-21 合作金庫商業銀行股份有限公司 News public opinion analysis and collection system

Also Published As

Publication number Publication date
TW202445381A (en) 2024-11-16

Similar Documents

Publication Publication Date Title
CN112348602B (en) Automatic advertisement putting management system based on big data
US8725711B2 (en) Systems and methods for information categorization
US20020038430A1 (en) System and method of data collection, processing, analysis, and annotation for monitoring cyber-threats and the notification thereof to subscribers
US9043358B2 (en) Enterprise search over private and public data
US6633867B1 (en) System and method for providing a session query within the context of a dynamic search result set
US20080189274A1 (en) Systems and methods for connecting relevant web-based product information with relevant network conversations
US8306965B2 (en) System and method for generating expertise based search results
US20050149519A1 (en) Document information search apparatus and method and recording medium storing document information search program therein
US20100274821A1 (en) Schema Matching Using Clicklogs
US20080140674A1 (en) Information distribution system, information distribution apparatus, and information distribution method
CN102208992A (en) Internet-facing filtration system of unhealthy information and method thereof
WO2005013046A2 (en) Ranking search results using conversion data
AU2011202277B2 (en) Methods, Apparatus and Articles of Manufacture to Rank Web Site Influence
US20080104034A1 (en) Method For Scoring Changes to a Webpage
CN110717093A (en) Spark-based movie recommendation system and method
US20130238375A1 (en) Evaluating email information and aggregating evaluation results
CN110502692B (en) Information retrieval method, device, equipment and storage medium based on search engine
US8484217B1 (en) Knowledge discovery appliance
CN116975396B (en) Government service intelligent recommendation method, system, device and storage medium
US20100325101A1 (en) Marketing asset exchange
JP2002024291A (en) System, method, and device for user support
US7389290B2 (en) System and method for scoring new messages based on previous responses within a system for harvesting community knowledge
JPH11161670A (en) Information filtering method, apparatus and system
JP2006309515A (en) Information distribution method and information distribution server
TWM645374U (en) News public opinion analysis and collection system