[go: up one dir, main page]

CN109300030A - Method and device for realizing stock investment recommendation - Google Patents

Method and device for realizing stock investment recommendation Download PDF

Info

Publication number
CN109300030A
CN109300030A CN201810942583.8A CN201810942583A CN109300030A CN 109300030 A CN109300030 A CN 109300030A CN 201810942583 A CN201810942583 A CN 201810942583A CN 109300030 A CN109300030 A CN 109300030A
Authority
CN
China
Prior art keywords
stock
opinion
probability
data
review
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810942583.8A
Other languages
Chinese (zh)
Inventor
王浩
张晨
庞旭林
杜长营
杨康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201810942583.8A priority Critical patent/CN109300030A/en
Publication of CN109300030A publication Critical patent/CN109300030A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

本发明公开了一种实现股票投资推荐的方法和装置,该方法包括:获取给定的股票集合;对于股票集合中的每支股票计算涨跌概率;根据股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议,方便快捷,准确度高,能够帮助投资者更加准确地理解市场走势以及股票动态,供投资者或股市分析员使用。

The invention discloses a method and a device for realizing stock investment recommendation. The method includes: acquiring a given stock set; calculating a rise and fall probability for each stock in the stock set; and according to the rise and fall of each stock in the stock set Probability, select one or more stocks for investment advice, which is convenient, fast, and highly accurate, and can help investors to more accurately understand market trends and stock dynamics for investors or stock market analysts.

Description

实现股票投资推荐的方法和装置Method and device for realizing stock investment recommendation

技术领域technical field

本发明涉及人工智能和大数据领域,具体涉及一种实现股票投资推荐的方法、装置、电子设备和计算机可读存储介质。The present invention relates to the fields of artificial intelligence and big data, and in particular to a method, device, electronic device and computer-readable storage medium for realizing stock investment recommendation.

背景技术Background technique

投资者通常会利用搜索引擎寻找相关价值信息帮助其最终决策,而这些决策过程大部分是依靠人的分析判断以及经验。事实上,互联网中的股票评论数据包含了丰富且有价值的语义信息,能够帮助投资者理解市场走势以及股票动态。已有的股票评论分析方法通常仅仅聚焦在捕获股票评论的情感极性,从而理解股票评论对于市场走势的宏观作用。然而,互联网中的股票评论往往包含了大量的噪声,如水军以及个人主观倾向从众心理等,从而严重地影响投资者的判断。因此利用人工智能技术对股票评论信息进行细粒度权威性分析,进而自动地为股民和股票分析师从海量信息中精选优质股票是非常有意义的。Investors usually use search engines to find relevant value information to help their final decision-making, and most of these decision-making processes rely on human analytical judgment and experience. In fact, stock review data on the Internet contains rich and valuable semantic information that can help investors understand market trends and stock dynamics. Existing stock review analysis methods usually only focus on capturing the sentiment polarity of stock reviews, so as to understand the macro effect of stock reviews on market trends. However, stock reviews on the Internet often contain a lot of noise, such as water army and individual subjective tendency to conform to the herd, which seriously affects the judgment of investors. Therefore, it is very meaningful to use artificial intelligence technology to conduct fine-grained and authoritative analysis of stock review information, and then automatically select high-quality stocks from massive information for investors and stock analysts.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的实现股票投资推荐的方法、装置、电子设备和计算机可读存储介质。In view of the above problems, the present invention is proposed to provide a method, apparatus, electronic device and computer-readable storage medium for implementing stock investment recommendation that overcome the above problems or at least partially solve the above problems.

依据本发明的一个方面,提供了一种实现股票投资推荐的方法,该方法包括:According to one aspect of the present invention, a method for implementing stock investment recommendation is provided, the method comprising:

获取给定的股票集合;Get the given stock collection;

对于所述股票集合中的每支股票计算涨跌概率;Calculate the probability of ups and downs for each stock in the stock set;

根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议。According to the rising and falling probability of each stock in the stock set, one or more stocks are selected for investment advice.

根据本发明的另一个方面,提供了一种实现股票投资推荐的装置,该装置包括:According to another aspect of the present invention, there is provided a device for implementing stock investment recommendation, the device comprising:

股票集合获取单元,适于获取给定的股票集合;A stock set acquisition unit, adapted to acquire a given stock set;

涨跌概率计算单元,适于对于所述股票集合中的每支股票计算涨跌概率;a rise and fall probability calculation unit, adapted to calculate the rise and fall probability for each stock in the stock set;

股票投资推荐单元,适于根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议。The stock investment recommendation unit is adapted to select one or more stocks for investment advice according to the rising and falling probability of each stock in the stock set.

根据本发明的又一个方面,提供了一种电子设备,所述电子设备包括:处理器,以及存储有可在处理器上运行的计算机程序的存储器;According to yet another aspect of the present invention, there is provided an electronic device, the electronic device comprising: a processor, and a memory storing a computer program executable on the processor;

其中,所述处理器,用于在执行所述存储器中的计算机程序时执行上述任一项所述的方法。Wherein, the processor is configured to execute any one of the methods described above when executing the computer program in the memory.

根据本发明的又一个方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述任一项所述的方法。According to yet another aspect of the present invention, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any of the methods described above.

根据本发明的技术方案,通过获取给定的股票集合;对于股票集合中的每支股票计算涨跌概率;根据股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议,方便快捷,准确度高,能够帮助投资者更加准确地理解市场走势以及股票动态,供投资者或股市分析员使用。According to the technical solution of the present invention, by obtaining a given stock set; calculating the probability of ups and downs for each stock in the stock set; selecting one or more stocks for investment according to the ups and downs of each stock in the stock set Advice is convenient, quick, and highly accurate, and can help investors more accurately understand market trends and stock dynamics for investors or stock market analysts.

上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and easy to understand , the following specific embodiments of the present invention are given.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

图1示出了根据本发明一个实施例的一种实现股票投资推荐的方法流程图;1 shows a flowchart of a method for implementing stock investment recommendation according to an embodiment of the present invention;

图2为一条股票评论数据信息示意图;Figure 2 is a schematic diagram of a stock comment data information;

图3为一条股票评论数据信息的另一表示方式示意图;3 is a schematic diagram of another representation of a piece of stock review data information;

图4为原始股票评论数据量和清洗后的股票评论数据量示意图;Figure 4 is a schematic diagram of the original stock review data volume and the cleaned stock review data volume;

图5为采用智能选股方法c选择股票后的盈利情况示意图;Figure 5 is a schematic diagram of the profit situation after selecting stocks using the intelligent stock selection method c;

图6示出了根据本发明一个实施例的一种实现股票投资推荐的装置示意图;6 shows a schematic diagram of an apparatus for implementing stock investment recommendation according to an embodiment of the present invention;

图7示出了根据本发明一个实施例的另一种实现股票投资推荐的装置示意图;FIG. 7 shows a schematic diagram of another device for implementing stock investment recommendation according to an embodiment of the present invention;

图8示出了根据本发明一个实施例的又一种实现股票投资推荐的装置示意图;Fig. 8 shows a schematic diagram of yet another device for implementing stock investment recommendation according to an embodiment of the present invention;

图9是本发明实施例中的电子设备的结构示意图;9 is a schematic structural diagram of an electronic device in an embodiment of the present invention;

图10是本发明实施例中的一种计算机可读存储介质的结构示意图。FIG. 10 is a schematic structural diagram of a computer-readable storage medium in an embodiment of the present invention.

具体实施方式Detailed ways

本发明出现的名词解释:Explanation of terms appearing in the present invention:

FM:Factorization Machine,因子分解机,是一种公知算法,由Steffen Rendle提出的一种基于矩阵分解的机器学习算法,被广泛的应用于分类及预估模型中。FM: Factorization Machine, factorization machine, is a well-known algorithm, a machine learning algorithm based on matrix decomposition proposed by Steffen Rendle, which is widely used in classification and prediction models.

SVM:Support Vector Machine,支持向量机,是一种公知算法,为一种常见的判别方法。在机器学习领域,是一个有监督的学习模型,通常用来进行模式识别、分类以及回归分析。SVM: Support Vector Machine, support vector machine, is a well-known algorithm and a common discrimination method. In the field of machine learning, it is a supervised learning model that is usually used for pattern recognition, classification and regression analysis.

ARMA:Auto Regressive Moving Average,自回归滑动平均模型,模型参量法高分辨率谱分析方法之一。这种方法是研究平稳随机过程有理谱的典型方法,适用于很大一类实际问题。它比AR模型法与MA模型法有较精确的谱估计及较优良的谱分辨率性能。ARMA: Auto Regressive Moving Average, an autoregressive moving average model, one of the high-resolution spectral analysis methods of the model parameter method. This method is a typical method to study the rational spectrum of stationary stochastic processes, and is suitable for a large class of practical problems. Compared with AR model method and MA model method, it has more accurate spectral estimation and better spectral resolution performance.

OSRatio:Opinion Shift Ratio,观点改变比率,用于表征股票评论员对同一股票改变观点的可能性。OSRatio: Opinion Shift Ratio, the Opinion Change Ratio, is used to characterize the likelihood that a stock commentator will change his opinion on the same stock.

TSRatio:the Ratio of True-then-Shift,改变正确观点比率,用于表征股票评论员对股票评论观点正确前提下改变观点的可能性。TSRatio: the Ratio of True-then-Shift, the ratio of correct opinions to change, is used to characterize the possibility of a stock commentator changing his opinion on the premise that the stock commentary's opinion is correct.

FSRatio:the Ratio of False-then-Shift,改变错误观点比率,用于表针股票评论员对股票评论观点错误前提下改变观点的可能性。FSRatio: the Ratio of False-then-Shift, which is the ratio of changing wrong views, which is used to indicate the possibility of a stock commentator changing his views on the premise of a wrong view of a stock review.

TCTRatio:the Reliability Ratio of True-then-Constant,一致正确观点可靠比率,用于表征股票评论员对股票评论观点正确前提下仍保持该观点的可靠性。TCTRatio: the Reliability Ratio of True-then-Constant, the reliability ratio of unanimously correct opinions, is used to characterize the reliability of a stock commentator's opinion on a stock comment under the premise that the opinion is correct.

TSTRatio:the Reliability Ratio of True-then-Shift,改变正确观点可靠比率,用于表征股票评论员对股票评论观点正确前提下改变观点的可靠性。TSTRatio: the Reliability Ratio of True-then-Shift, the reliability ratio of changing correct opinions, is used to characterize the reliability of changing opinions on the premise that the opinions of stock commentators are correct.

FCTRatio:the Reliability Ratio of False-then-Constant,一致错误观点可靠比率,用于表征股票评论员对股票评论观点错误前提下仍保持该观点的可靠性。FCTRatio: the Reliability Ratio of False-then-Constant, the unanimous error reliability ratio, is used to characterize the reliability of the stock commentator's opinion on the stock comment under the premise of being wrong.

FSTRatio:the Reliability Ratio of False-then-Shift,改变错误观点可靠比率,用于表征股票评论员对股票评论观点错误前提下改变观点的可靠性。FSTRatio: the Reliability Ratio of False-then-Shift, the reliability ratio of changing wrong opinions, which is used to characterize the reliability of changing opinions under the premise that stock commentators are wrong about stock comments.

BIC准则:Bayesian Information Criterion,贝叶斯信息准则。贝叶斯决策理论是主观贝叶斯派归纳理论的重要组成部分。是在不完全情报下,对部分未知的状态用主观概率估计,然后用贝叶斯公式对发生概率进行修正,最后再利用期望值和修正概率做出最优决策。BIC Criterion: Bayesian Information Criterion, Bayesian Information Criterion. Bayesian decision theory is an important part of subjective Bayesian induction theory. It is to estimate the partially unknown state with subjective probability under incomplete intelligence, then use the Bayesian formula to modify the probability of occurrence, and finally use the expected value and the modified probability to make the optimal decision.

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

图1示出了根据本发明一个实施例的一种实现股票投资推荐的方法流程图,如图1所示,该方法包括:FIG. 1 shows a flowchart of a method for implementing stock investment recommendation according to an embodiment of the present invention. As shown in FIG. 1 , the method includes:

步骤S11:获取给定的股票集合;Step S11: obtain a given stock set;

步骤S12:对于股票集合中的每支股票计算涨跌概率;Step S12: Calculate the probability of ups and downs for each stock in the stock set;

步骤S13:根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议。该步骤包括:Step S13: Select one or more stocks for investment advice according to the rising and falling probability of each stock in the stock set. This step includes:

根据股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议包括如下中的一种或多种:According to the probability of rising and falling of each stock in the stock set, select one or more stocks for investment advice, including one or more of the following:

选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择平均加权的方式;Select the preset number of stocks with the highest probability of rising and rising for investment advice, and choose the average weighting method for investment weight;

选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择按照涨的概率加权的方式;Select the preset number of stocks with the highest probability of rising and rising for investment advice, and the investment weight is selected according to the probability of rising;

从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择平均加权的方式;Select a stock with the highest probability of rising and rising from each stock sector, and choose an average weighting method for investment weight;

从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择按照涨的概率加权的方式;Select a stock with the highest probability of rising and rising from each stock sector, and the investment weight will be weighted according to the probability of rising;

从每个股票板块中选取一支或多支涨且涨的概率最高的股票,在各板块之间选择平均加权方式,在选取的每个板块的股票之间择按照涨的概率加权的方式。Select one or more stocks with the highest rising probability from each stock sector, choose an average weighting method among the sectors, and choose a weighting method according to the rising probability among the selected stocks in each sector.

在本发明的一个实施例中,根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议包括如下中的一种或多种:In an embodiment of the present invention, selecting one or more stocks for investment advice includes one or more of the following according to the rising and falling probability of each stock in the stock set:

获取针对同一支股票sj的股票评论数据集 Get a dataset of stock reviews for the same stock s j

根据如下公式计算该支股票的涨跌概率cf(sj):Calculate the probability of the stock going up or down cf(s j ) according to the following formula:

其中,表示股票评论数据集中的股票评论数据数量,ci表示一条股票评论数据,为该条股票评论数据的观点极性,为该条股票评论数据的可靠性指数,rυ(ci)为对该条股票评论数据进行可靠性分类的准确值;in, Represents the stock reviews dataset The number of stock review data in , c i represents a piece of stock review data, Opinion polarity for the stock comment data, is the reliability index of the stock review data, (ci ) is the exact value of the reliability classification of the stock review data;

当cf(sj)≥0时,股票sj涨,且涨的概率是|cf(sj)|;When cf(s j )≥0, the stock s j goes up, and the probability of going up is |cf(s j )|;

当cf(sj)<0时,股票sj跌,且跌的概率是|cf(sj)|。When cf(s j )<0, the stock s j falls, and the probability of falling is |cf(s j )|.

通过获取给定的股票集合;对于股票集合中的每支股票计算涨跌概率;根据股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议,方便快捷,准确度高,能够帮助投资者更加准确地理解市场走势以及股票动态,供投资者或股市分析员使用。By obtaining a given stock set; calculating the probability of rising and falling for each stock in the stock set; selecting one or more stocks for investment advice according to the rising and falling probability of each stock in the stock set, which is convenient, fast and accurate High, which can help investors understand market trends and stock dynamics more accurately for investors or stock market analysts.

在本发明的一个实施例中,图1所示实施例根据如下方法确定一条股票评论数据的观点极性 In one embodiment of the present invention, the embodiment shown in FIG. 1 determines the opinion polarity of a piece of stock review data according to the following method

获取由股票评论数据组成的训练集和验证集,并为训练集和验证集中的每条股票评论数据标注观点极性;Obtain a training set and a validation set consisting of stock review data, and label opinion polarity for each stock review data in the training set and validation set;

基于标注后的训练集,对机器学习模型进行训练,并基于标注后的测试集对学习模型的效果进行评测,得到训练后的机器学习模型;Based on the labeled training set, the machine learning model is trained, and the effect of the learning model is evaluated based on the labeled test set, and the trained machine learning model is obtained;

将待预测的股票评论数据的相关信息输入到训练后的机器学习模型,得到该机器学习模型输出的该股票评论数据的观点极性分类信息,并根据该观点极性分类信息确定该股票评论数据的观点极性。Input the relevant information of the stock review data to be predicted into the trained machine learning model, obtain the opinion polarity classification information of the stock review data output by the machine learning model, and determine the stock review data according to the opinion polarity classification information view polarity.

在本发明的一个实施例中,根据如下公式确定一条股票评论数据的可靠性指数 In an embodiment of the present invention, the reliability index of a piece of stock review data is determined according to the following formula

其中,代表日期,的股票价格,后一天的股票价格,是股票评论观点极性in, represents the date, Yes stock price, Yes the stock price the next day, is stock comment opinion polarity

在本发明的一个实施例中,图1所示实施例根据如下方式确定对一条股票评论数据进行可靠性分类的准确值rυ(ci):In an embodiment of the present invention, the embodiment shown in FIG. 1 determines the exact value (ci ) for reliability classification of a piece of stock review data according to the following method:

基于股票评论数据集和股价序列集提取特征向量;Extract feature vector based on stock review dataset and stock price sequence set;

利用所提取的特征向量训练基于径向基核函数的支持向量机SVM模型;Use the extracted feature vector to train the support vector machine SVM model based on radial basis kernel function;

利用股价序列集训练用于预测股价的机器学习模型;Train a machine learning model for predicting stock prices using stock price sequence sets;

集成SVM模型和用于预测股价的机器学习模型,得到用于评价股票评论可靠性的分类模型 Integrate the SVM model and the machine learning model for predicting stock prices to obtain a classification model for evaluating the reliability of stock reviews

rυ(ci)的值越大,表示对股票评论可靠性的分类结果越可靠。but The larger the value of (ci ), the more reliable the classification result of the reliability of stock reviews.

在本发明的一个实施例中,图1所示实施例中基于股票评论数据集和股价序列集提取特征向量包括:In an embodiment of the present invention, in the embodiment shown in FIG. 1, extracting a feature vector based on a stock review data set and a stock price sequence set includes:

基于股票评论数据集中的至少部分股票评论数据中的每一条股票评论数据,提取如下特征中的一种或多种组成一个特征向量:Based on each piece of stock review data in at least part of the stock review data set, one or more of the following features are extracted to form a feature vector:

该条股票评论数据的看涨或看跌的观点极性信息;The bullish or bearish view polarity information of the stock comment data;

在t当日发布的所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量;Among all the stock comment data for stock s released on the day t, the number of bullish stock comment data and the number of bearish stock comment data;

从t日起过去的第一预设长度时间内发布的,所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;The number of bullish stock reviews, the number of bearish stock reviews, the number of stock reviews with correct views, and the number of stock reviews with wrong views among all stock review data for stock s published within the first preset length of time from day t The number of stock review data;

从t日起过去的第二预设长度时间内的股票s的价格序列;The price sequence of the stock s in the second preset length of time since day t;

用于预测股价的机器学习模型预测的股票s在下一个交易日的价格以及该模型输出的标准差;The price of the stock s predicted by the machine learning model used to predict the stock price on the next trading day and the standard deviation of the output of the model;

从t日起过去的第三预设长度时间内,股票评论员a发布的所有股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the third preset length of time from day t, among all the stock comment data published by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, the number of stock comment data with correct view, and the number of incorrect view data The number of stock review data;

从t日起过去的第四预设长度时间内,股票评论员a发布的针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the fourth preset length of time since day t, among the stock comment data for stock s released by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, and the number of correct stock comment data and the number of misinformed stock review data;

基于股票评论员a的从t日起过去的第五预设长度时间内发布的股票评论序列确定的,基于股票评论员a的观点改变概率OSRatio、观点正确的前提下改变观点的概率TSRatio、观点错误的前提下改变观点的概率FSRatio、观点正确的前提下保持观点且保持的观点正确的概率TCTRatio、观点正确的前提下改变观点且改变的观点正确的概率TSTRatio、观点错误的前提下保持观点且保持的观点正确的概率FCTRatio以及观点错误的前提下改变观点且改变的观点正确的概率FSTRatio中的一种或多种;Determined based on the stock comment sequence published by stock commentator a in the fifth preset length of time from day t, based on the probability of change of opinion of stock commentator a, OSRatio, probability of change of opinion under the premise of correct opinion TSRatio, opinion Probability of changing your opinion under the wrong premise FSRatio, maintaining your opinion if you are right and keeping your opinion correct TCTRatio, changing your opinion if you are right and changing your opinion is correct TSTRatio, maintaining your opinion if you are wrong and maintaining your opinion One or more of the probability FCTRatio of maintaining the correct view and the probability FSTRatio of changing the view and changing the view if the view is wrong;

其中,该条股票评论数据的股票评论员为a,评论的是股票s,发布日期为t。Among them, the stock commentator of the stock comment data is a, the comment is the stock s, and the release date is t.

在本发明的一个实施例中,图1所示实施例中利用所提取的特征向量训练基于径向基核函数的SVM模型包括:In one embodiment of the present invention, in the embodiment shown in FIG. 1, using the extracted feature vector to train the SVM model based on the radial basis kernel function includes:

令径向基核函数为: Let the radial basis kernel function be:

SVM模型为: The SVM model is:

其中,x1和x2是两个特征向量,Y是径向基核函数的参数;函数φ(·)将原始特征映射到高维内核空间,以便进行最优决策超平面的计算;Among them, x 1 and x 2 are two feature vectors, and Y is the parameter of the radial basis kernel function; the function φ( ) maps the original feature to the high-dimensional kernel space for the calculation of the optimal decision hyperplane;

通过优化如下的目标函数来计算SVM模型的参数ω和b:The parameters ω and b of the SVM model are calculated by optimizing the objective function as follows:

s.r.yiTφ(ci)+b)≥l-ξisry iT φ(ci )+b)≥l-ξ i ,

ξi≥0,i=1,…,Nξ i ≥ 0, i=1,...,N

其中,C是训练样本中噪声与简化超平面分类的权衡参数,yi是股票评论观点是否正确的标签。where C is the trade-off parameter for noise versus simplified hyperplane classification in the training samples, and yi is the label for whether the stock review opinion is correct.

在本发明的一个实施例中,图1所示实施例中利用股价序列集训练用于预测股价的机器学习模型包括:In one embodiment of the present invention, in the embodiment shown in FIG. 1 , the machine learning model for predicting stock prices using stock price sequence sets to train includes:

确定作为模型训练集和测试集的股票价格序列数据,其中训练集或测试集中的每一条数据包括:用于输入模型的连续若干天的股票收盘价,以及作为标签的后一天的股票收盘价;Determine the stock price sequence data as the model training set and test set, wherein each piece of data in the training set or test set includes: the stock closing price of several consecutive days used to input the model, and the stock closing price of the next day as a label;

基于训练集训练ARMA模型,并基于验证集验证模型的预测效果Train the ARMA model based on the training set and verify the prediction effect of the model based on the validation set

在本发明的一个实施例中,集成SVM模型和用于预测股价的机器学习模型,得到用于评价股票评论可靠性的分类模型包括:In one embodiment of the present invention, integrating the SVM model and the machine learning model for predicting stock prices, the obtained classification model for evaluating the reliability of stock reviews includes:

基于用于预测股价的机器学习模型的股价预测结果,构建如下的分类方程:Based on the stock price prediction results of the machine learning model used to predict stock prices, the following classification equation is constructed:

其中,时间的股价,是用于预测股价的机器学习模型预测的后一天的股票价格,是股票评论观点极性,err(ci)是用于预测股价的机器学习模型输出的当前股票预测价格的标准差;in, Yes time stock price, is predicted by a machine learning model used to predict stock prices the stock price the next day, is the stock review opinion polarity, and err( ci ) is the standard deviation of the current stock forecast price output by the machine learning model used to predict stock prices;

集成SVM模型和用于预测股价的机器学习模型:其中,u∈[0,1];Integrate an SVM model and a machine learning model for predicting stock prices: where u∈[0,1];

最终的用于评价股票评论可靠性的分类模型为:The final classification model for evaluating the reliability of stock reviews is:

其中,h(ci)为1时,表示股评可靠;h(ci)为-1时,表示股评不可靠Among them, when h(c i ) is 1, it indicates that the stock rating is reliable; when h( ci ) is -1, it indicates that the stock rating is unreliable

在本发明的一个实施例中,图1所示实施例通过如下方式确定该条股票评论数据的看涨或看跌的观点极性信息:In an embodiment of the present invention, the embodiment shown in FIG. 1 determines the bullish or bearish opinion polarity information of the piece of stock review data in the following manner:

获取由股票评论数据组成的训练集和验证集,并为训练集和验证集中的每条股票评论数据标注观点极性;Obtain a training set and a validation set consisting of stock review data, and label opinion polarity for each stock review data in the training set and validation set;

基于标注后的训练集,对机器学习模型进行训练,并基于标注后的测试集对学习模型的效果进行评测,得到训练后的用于预测股票评论数据观点极性的机器学习模型;Based on the labeled training set, the machine learning model is trained, and the effect of the learning model is evaluated based on the labeled test set, and the trained machine learning model for predicting the opinion polarity of the stock review data is obtained;

将该条股票评论数据输入到用于预测股票评论数据观点极性的机器学习模型,得到该模型输出的该股票评论数据的观点极性分类信息Input the stock review data into the machine learning model for predicting the opinion polarity of the stock review data, and obtain the opinion polarity classification information of the stock review data output by the model

在本发明的一个实施例中,图1所示实施例基于如下方法确定股票评论员a的观点极性分布信息:In an embodiment of the present invention, the embodiment shown in FIG. 1 determines the opinion polarity distribution information of stock commentator a based on the following method:

基于股票评论员a对同一股票的股票评论序列中的各相邻股票评论数据,提取股评数据对;Based on each adjacent stock comment data in the stock comment sequence of the same stock by stock commentator a, the stock comment data pair is extracted;

基于提取的股评数据对,确定该股票评论员a的观点改变概率OSRatio、观点正确的前提下改变观点的概率TSRatio、观点错误的前提下改变观点的概率FSRatio、观点正确的前提下保持观点且保持的观点正确的概率TCTRatio、观点正确的前提下改变观点且改变的观点正确的概率TSTRatio、观点错误的前提下保持观点且保持的观点正确的概率FCTRatio以及观点错误的前提下改变观点且改变的观点正确的概率FSTRatio。Based on the extracted stock review data pairs, determine the probability of change of opinion of the stock commentator a, OSRatio, probability of change of opinion under the premise of correct opinion, probability of change of opinion under the premise of correct opinion, probability of change of opinion under the premise of wrong opinion FSRatio, probability of change of opinion under the premise of correct opinion, and maintenance of opinion under the premise of correct opinion. Probability of correct opinion TCTRatio, Probability of changing opinion under the premise of correct opinion and changing opinion is correct TSTRatio, Probability of maintaining opinion under the premise of wrong opinion and remaining correct opinion and Probability of changing opinion under the premise of wrong opinion and changing opinion Correct probability FSTRatio.

本发明提出的对股票评论数据进行可靠性建模的解决方案,该方案为一个统一的框架,融合了多种异构信息源,例如股票价格时序、股票评论文本内容以及发表股票评论的股票评论员的历史行为,可以有效过滤噪声,筛选出有价值、可靠的股票评论信息,供投资者或股市分析员使用;不仅可应用于股票评论信息可靠性分析,还可应用于金融领域其他方面,如经济形势分析、股票精准推荐、投资组合管理和自动交易等。具体实现方案如下:The solution for reliability modeling of stock review data proposed by the present invention is a unified framework that integrates a variety of heterogeneous information sources, such as stock price time series, stock review text content, and stock reviews that publish stock reviews. It can effectively filter noise and screen out valuable and reliable stock review information for investors or stock market analysts; it can not only be used for reliability analysis of stock review information, but also in other aspects of the financial field. Such as economic situation analysis, accurate stock recommendation, portfolio management and automatic trading. The specific implementation scheme is as follows:

一、股票评论数据清洗处理,通过数据清洗可以初步清洗掉互联网得到的股票评论数据的噪声,包括:1. Cleaning and processing of stock review data. Through data cleaning, the noise of stock review data obtained from the Internet can be initially cleaned, including:

(1)删除观点极性为中立的股票评论数据。(1) Delete stock review data with neutral opinion polarity.

(2)删除长度小于5的股票评论序列所对应的序列数据及股票评论数据。(2) Delete the sequence data and stock review data corresponding to the stock review sequence whose length is less than 5.

图2为一条股票评论数据信息示意图,如图2所示,一条股票评论文本包括股评员201(allan)、时间202(8days ago)、观点极性203(BUY,Bullish)、目标股票204(IBM)、评论内容205(I think there is a support at 173.11)等信息。Figure 2 is a schematic diagram of a stock review data information. As shown in Figure 2, a stock review text includes stock reviewer 201 (allan), time 202 (8 days ago), opinion polarity 203 (BUY, Bullish), target stock 204 (IBM ), comment content 205 (I think there is a support at 173.11) and other information.

其中,因观点极性为中立时,很难被自动识别,即删除观点极性为中立的股票评论数据需要人工去筛选。“长度小于5的股票评论序列”是指同一股票评论人对同一股票的评论次数小于5。Among them, when the opinion polarity is neutral, it is difficult to be automatically identified, that is, deleting the stock review data whose opinion polarity is neutral requires manual screening. "Stock review sequence with length less than 5" means that the number of reviews on the same stock by the same stock reviewer is less than 5.

图3为一条股票评论数据信息的另一表示方式示意图,从图中可以看出,目标股票分类为A股,提问者对是否买进sh60000,股票评论员柳岸林对此进行了回答,评论时间为2016-12-29,观点极性为看涨,包含观点极性的内容为:股价遇到年线支撑,可以考虑买入,观点供参考。Figure 3 is a schematic diagram of another representation of a stock review data information. It can be seen from the figure that the target stock is classified as A shares. 2016-12-29, the opinion polarity is bullish, including the content of opinion polarity: the stock price encounters the support of the annual line, you can consider buying, the opinion is for reference.

图4为原始股票评论数据量和清洗后的股票评论数据量示意图,该数据来源为新浪理财师网站。从图中可以看出,清洗后的数量大大减少,清除掉了大量股票评论数据噪声,进而减小了后续数据处理的计算量。Figure 4 is a schematic diagram of the original stock review data volume and the cleaned stock review data volume, and the data source is the Sina Financial Planner website. As can be seen from the figure, the number of cleaned data is greatly reduced, which removes a large amount of noise in the stock review data, thereby reducing the amount of calculation for subsequent data processing.

二、股票评论员观点极性及可靠性分布模式挖掘,可以通过股票评论员历史股票评论信息挖掘其股票评论极性倾向及可靠性分布,包括:2. Mining of stock commentator opinion polarity and reliability distribution pattern, the polarity tendency and reliability distribution of stock commentary can be mined through historical stock comment information of stock commentator, including:

(1)通过股票评论员历史股票评论信息统计该股票评论员的股票评论极性分布,即发布看涨及看跌概率分布。挖掘股票评论员的观点极性分布信息包括四种模式中一种或多种,简单概括为:一对一、一对多、多对一和多对多,具体为:(1) Statistics of the stock comment polarity distribution of the stock commentator by the stock commentator's historical stock comment information, that is, the release of the bullish and bearish probability distribution. Mining the opinion polarity distribution information of stock commentators includes one or more of four modes, which are briefly summarized as: one-to-one, one-to-many, many-to-one and many-to-many, specifically:

基于所获取的股票评论数据中的同一股票评论员针对同一股票的所有历史股票评论数据,确定该股票评论员针对该股票发布看涨的股票评论数据的概率,以及确定该股票评论员针对该股票发布看跌的股票评论数据的概率;Based on all historical stock review data of the same stock reviewer for the same stock in the acquired stock review data, determine the probability that the stock reviewer will issue bullish stock review data for the stock, and determine that the stock reviewer will issue bullish stock review data for the stock Probability of bearish stock review data;

基于所获取的股票评论数据中的同一股票评论员针对不同股票的所有历史股票评论数据,确定该股票评论员发布看涨的股票评论数据的概率,以及确定该股票评论员发布看跌的股票评论数据的概率;Based on all historical stock review data of the same stock reviewer for different stocks in the obtained stock review data, determine the probability that the stock reviewer will issue bullish stock review data, and determine the probability that the stock reviewer will issue bearish stock review data probability;

基于所获取的股票评论数据中的不同股票评论员针对同一股票的所有历史股票评论数据,确定股票评论员针对该股票发布看涨的股票评论数据的概率,以及确定股票评论员针对该股票发布看跌的股票评论数据的概率;Based on all historical stock review data for the same stock by different stock reviewers in the obtained stock review data, determine the probability that a stock reviewer will issue bullish stock review data for the stock, and determine that a stock reviewer will issue a bearish stock review data for the stock Probability of stock review data;

基于所获取的股票评论数据中的不同股票评论员针对不同股票的所有历史股票评论数据,确定发布看涨的股票评论数据的概率,以及确定发布看跌的股票评论数据的概率。Based on all historical stock review data for different stocks by different stock reviewers in the acquired stock review data, a probability of posting bullish stock review data and a probability of posting bearish stock review data are determined.

(2)通过股票评论员历史股票评论信息统计该股票评论员的股票评论可靠性分布,即股票评论可靠及不可靠概率分布。(2) Statistics of the reliability distribution of the stock reviews by the stock reviewer through the stock reviewer's historical stock review information, that is, the reliability and unreliable probability distribution of the stock reviews.

三、股票评论员观点一致性模式挖掘,通过股票评论员历史股票评论序列数据挖掘其观点一致性概率分布,包括:3. Mining of opinion consistency patterns of stock commentators, mining their opinion consistency probability distributions through historical stock review sequence data of stock commentators, including:

(1)基于同一股票评论员对同一股票的股票评论序列中的各相邻股票评论数据,提取股票评论数据对,即2-gram数据对,该数据对为包含观点极性的股票评论数据对;(1) Based on the adjacent stock comment data in the stock comment sequence of the same stock commentator by the same stock commentator, extract the stock comment data pair, that is, the 2-gram data pair, which is the stock comment data pair containing the opinion polarity. ;

(2)基于提取的股票评论数据对,统计该股票评论员保持观点的概率和改变观点的概率。(2) Based on the extracted stock review data pairs, calculate the probability that the stock reviewer maintains his opinion and the probability that he changes his opinion.

例如,同一股票评论员对同一股票的股票评论序列中的各相邻股票评论数据为:看涨、看跌、看跌、看涨、看涨,基于上述数据,得到观点极性的2-gram数据对,分别为:看涨、看跌;看跌、看跌;看跌、看涨;看涨、看涨。基于上述2-gram数据对,统计出该股票评论员保持观点的概率,即观点一致的概率为0.5,改变观点的概率为0.5。For example, the adjacent stock comment data in the stock comment sequence of the same stock commentator are: bullish, bearish, bearish, bullish, bullish. Based on the above data, the 2-gram data pairs of opinion polarities are obtained, which are respectively : bullish, bearish; bearish, bearish; bearish, bullish; bullish, bullish. Based on the above 2-gram data pair, the probability that the stock commentator maintains his or her opinion is calculated, that is, the probability of unanimous opinion is 0.5, and the probability of changing his opinion is 0.5.

四、股票评论员观点改变模式挖掘,通过股票评论员历史股票评论序列数据挖掘其观点改变模式,包括:4. Mining of opinion change patterns of stock commentators, mining their opinion change patterns through historical stock comment sequence data of stock commentators, including:

(1)基于同一股票评论员对同一股票的股票评论序列中的各相邻股票评论数据,提取股票评论数据对,即利用股票评论员对同一股票的评论序列数据,提取观点极性和观点正确与否两种2-gram数据对;(1) Based on the adjacent stock comment data in the stock comment sequence of the same stock commentator for the same stock, extract the stock comment data pair, that is, use the comment sequence data of the stock commentator on the same stock to extract opinion polarity and correct opinion or not two 2-gram data pairs;

(2)基于提取的股票评论数据对,确定该股票评论员在观点正确的前提下改变观点的概率TSRatio,以及确定该股票评论员在观点错误的前提下改变观点的概率FSRatio,即根据观点极性数据对统计在观点正确前提下改变观点的概率TSRatio、观点错误前提下改变观点的概率FSRatio;(2) Based on the extracted stock comment data pairs, determine the probability TSRatio of the stock commentator to change his opinion under the premise of being correct, and determine the probability FSRatio of the stock commentator to change his opinion under the premise of being wrong, that is, according to the extreme Sexual data pair statistics on the probability TSRatio of changing opinions under the premise of correct opinions, and the probability FSRatio of changing opinions under the premise of wrong opinions;

(3)基于提取的股票评论数据对,确定该股票评论员在观点正确的前提下保持观点,且保持的观点正确的概率TCTRatio,以及确定该股票评论员在观点正确的前提下改变观点,且改变的观点正确的概率TSTRatio,即根据数据对统计观点正确前提下保持观点的可靠性TCTRatio(即股票评论员前一时刻观点正确、下一时刻仍然保持该观点且正确)、观点正确前提下改变观点的可靠性TSTRatio;(3) Based on the extracted stock comment data pairs, determine the stock commentator maintains his opinion on the premise that his opinion is correct, and the probability TCTRatio of the maintained opinion is correct, and determine that the stock commentator changes his opinion on the premise that the opinion is correct, and The probability TSTRatio of the changed opinion is correct, that is, the reliability of maintaining the opinion based on the data on the premise that the statistical opinion is correct. reliability of opinion TSTRatio;

(4)基于提取的股票评论数据对,确定该股票评论员在观点错误的前提下保持观点,且保持的观点正确的概率FCTRatio,以及确定该股票评论员在观点错误的前提下改变观点,且改变的观点正确的概率FSTRatio,即根据数据对统计观点错误前提下保持观点的可靠性FCTRatio(即股票评论员前一时刻观点错误、下一时刻仍然保持该观点且正确)、观点错误前提下改变观点的可靠性FSTRatio。(4) Based on the extracted stock comment data pairs, determine the probability FCTRatio that the stock commentator maintains his opinion under the premise of being wrong, and the probability FCTRatio that the maintained opinion is correct, and determines that the stock commentator changes his opinion under the premise of being wrong, and The probability FSTRatio of the changed opinion is correct, that is, the reliability of maintaining the opinion under the premise that the statistical opinion is wrong according to the data FCTRatio (that is, the stock commentator's opinion was wrong at the previous moment, and the opinion is still correct at the next moment), and the opinion is changed under the premise of wrong opinion. Opinion reliability FSTRatio.

例如,同一股票评论员对同一股票的股票评论序列中的各相邻股票评论数据为:看涨、看跌、看跌、看涨、看涨,基于上述数据,得到观点极性的2-gram数据对,分别为:看涨、看跌;看跌、看跌;看跌、看涨;看涨、看涨,同时得到观点正确与否的2-gram数据对,对应分别为:正确、正确;错误、正确;正确、错误;正确,正确。For example, the adjacent stock comment data in the stock comment sequence of the same stock commentator are: bullish, bearish, bearish, bullish, bullish. Based on the above data, the 2-gram data pairs of opinion polarities are obtained, which are respectively : bullish, bearish; bearish, bearish; bearish, bullish; bullish, bullish, at the same time get the 2-gram data pair of whether the opinion is correct or not, corresponding respectively: correct, correct; wrong, correct; correct, wrong; correct, correct.

根据观点极性数据对统计在观点正确前提下改变观点的概率TSRatio为0.5,观点错误前提下改变观点的概率FSRatio为0;根据数据对统计观点正确前提下保持观点的可靠性TCTRatio为0.25,观点正确前提下改变观点的可靠性TSTRatio为0.25;根据数据对统计观点错误前提下保持观点的可靠性FCTRatio为0.25,观点错误前提下改变观点的可靠性FSTRatio为0。According to the polarity of opinion data, the probability TSRatio of changing opinions on the premise of correct opinions is 0.5, and the probability FSRatio of changing opinions under the premise of wrong opinions is 0; the reliability of maintaining opinions on the premise that statistical opinions are correct according to the data TCTRatio is 0.25, opinions The reliability TSTRatio of changing opinions under the correct premise is 0.25; the reliability FCTRatio of maintaining opinions under the premise of incorrect statistical opinions according to the data is 0.25, and the reliability of changing opinions under the premise of incorrect opinions is 0.

五、股票评论观点极性检测(o(ci)),利用收集的历史股票评论文本数据训练FM模型,基于训练好的FM模型对股票评论数据进行观点极性分类预测,其中,FM模型即机器学习模型,为一种现有的算法模型,但本发明对其进行了特殊处理,应用于股票观点极性检测,具体包括:Five, stock review opinion polarity detection (o(ci)), using the collected historical stock review text data to train the FM model, and based on the trained FM model to classify and predict the opinion polarity of the stock review data, where the FM model is the machine The learning model is an existing algorithm model, but it is specially processed in the present invention and applied to the polarity detection of stock opinions, which specifically includes:

(1)获取由股票评论文本组成的训练集和验证集,并为训练集和验证集中的每条股票评论文本标注观点极性,即确定训练集、开发集和测试集股票评论文本,其中开发集和测试集类似,统称为验证集。其中,开发集用于在训练过程中对模型参数进行优化,得到最优模型,测试集用于训练后对模型的效果进行测试;观点极性的标注为人工标注,即人工标注训练集和测试集中每条股票评论文本的情感极性(看涨或看跌)。(1) Obtain a training set and a validation set consisting of stock review texts, and annotate opinion polarity for each stock review text in the training set and validation set, that is, determine the training set, development set and test set of stock review texts, in which the development The set is similar to the test set and is collectively referred to as the validation set. Among them, the development set is used to optimize the model parameters during the training process to obtain the optimal model, and the test set is used to test the effect of the model after training; the labeling of opinion polarity is manual labeling, that is, the training set and the test set are manually labeled Concentrates the sentiment polarity (bullish or bearish) of each stock review text.

(2)对训练集文本进行分词处理,统计得到词典,例如,“我认为明天股票会涨”,可分词为:“我”、“认为”、“明天”、“股票”、“会”、“涨”,类似该分词方法,统计得到词典。(2) Perform word segmentation on the text of the training set, and obtain a dictionary based on statistics. For example, "I think the stock will rise tomorrow". The separable words are: "I", "think", "tomorrow", "stock", "will", "Rising", similar to the word segmentation method, obtains a dictionary by statistics.

(3)基于该词典,确定训练集中的每条股票评论文本的TF-IDF特征,该特征为词典尺寸大小的向量,每个维度为相应词基于该文本的TF-IDF值。(3) Based on the dictionary, determine the TF-IDF feature of each stock review text in the training set, the feature is a vector of the size of the dictionary, and each dimension is the TF-IDF value of the corresponding word based on the text.

TF-IDF(term frequency–inverse document frequency)是一种用于信息检索与数据挖掘的常用加权技术。TF意思是词频(Term Frequency),IDF意思是逆文本频率指数(Inverse Document Frequency)。TFIDF的主要思想是:如果某个词或短语在一篇文章中出现的频率高,并且在其他文章中很少出现,则认为此词或者短语具有很好的类别区分能力,适合用来分类。TFIDF实际上是:TF*IDF,TF词频(Term Frequency),IDF逆向文件频率(Inverse Document Frequency)。TF表示词条在文档d中出现的频率。IDF的主要思想是:如果包含词条t的文档越少,也就是n越小,IDF越大,则说明词条t具有很好的类别区分能力。如果某一类文档C中包含词条t的文档数为m,而其它类包含t的文档总数为k,显然所有包含t的文档数n=m+k,当m大的时候,n也大,按照IDF公式得到的IDF的值会小,就说明该词条t类别区分能力不强。TF-IDF (term frequency–inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF means Term Frequency, and IDF means Inverse Document Frequency. The main idea of TFIDF is: if a word or phrase appears frequently in one article and rarely appears in other articles, it is considered that the word or phrase has a good ability to distinguish between categories and is suitable for classification. TFIDF is actually: TF*IDF, TF Term Frequency, IDF Inverse Document Frequency. TF represents the frequency of the term appearing in document d. The main idea of IDF is: if there are fewer documents containing the term t, that is, the smaller the n, the larger the IDF, it means that the term t has a good ability to distinguish between categories. If the number of documents containing term t in a certain type of document C is m, and the total number of documents containing t in other types is k, obviously the number of documents containing t is n=m+k. When m is large, n is also large. , the value of the IDF obtained according to the IDF formula will be small, which means that the classification ability of the entry t is not strong.

简单来说,训练集中出现的一些其他文档中经常出现的常用词,例如“的”、“了”等,这些词的重要性比较低,而在股票评论文本中出现的“看涨”、“看跌”之类的观点极性词,重要性较高。TF-IDF就是评价词典中每个词的重要性的特征。To put it simply, common words that appear frequently in some other documents in the training set, such as "的", "le", etc., are of low importance, while "bullish", "bearish" appearing in stock review texts. Opinion polar words such as ” are more important. TF-IDF is a feature that evaluates the importance of each word in the dictionary.

关于该TF-IDF特征为词典尺寸大小的向量,每个维度为相应词基于该文本的TF-IDF值的理解,举例来说,100个句子中一共含有1000个词,那么每个句子的向量为1000维,比如该初始向量为[1,0,0,……1],其中1代表目标词在句子中出现,0代表目标词在句子中没有出现,初始向量中的1和0都要乘以该股票评论文本的TF-IDF值,即乘以该词的权重,得到股票评论文本的TF-IDF特征。Regarding the TF-IDF feature is a vector of the size of the dictionary, each dimension is the understanding of the corresponding word based on the TF-IDF value of the text. For example, there are 1000 words in 100 sentences, then the vector of each sentence It is 1000-dimensional, for example, the initial vector is [1, 0, 0, ... 1], where 1 means that the target word appears in the sentence, 0 means that the target word does not appear in the sentence, and both 1 and 0 in the initial vector must be Multiply the TF-IDF value of the stock review text, that is, multiply the weight of the word, to get the TF-IDF feature of the stock review text.

(4)从训练集的股票评论文本中提取特征,以提取的特征作为机器学习模型的输入,以股票评论文本的观点极性分类信息作为机器学习模型的输出;即将训练集股票评论文本的TF-IDF特征作为模型输入特征,股票评论情感极性为输出,即输出看涨还是看跌,也即输出1还是0。(4) Extract features from the stock review text of the training set, use the extracted features as the input of the machine learning model, and use the opinion polarity classification information of the stock review text as the output of the machine learning model; about the TF of the stock review text in the training set - The IDF feature is used as the model input feature, and the sentiment polarity of the stock comment is the output, that is, the output is bullish or bearish, that is, the output is 1 or 0.

(5)基于机器学习模型的输出的观点极性分类信息和相应股票评论文本标注的观点极性,计算机器学习模型的损失,并基于计算出的损失学习机器学习模型的参数;即基于训练集,使用自适应正则化的随机梯度下降法学习FM模型参数,利用交叉验证的方式优化调整FM模型中超参数k的值,其中超参数k的值为人工给定的值。(5) Based on the opinion polarity classification information of the output of the machine learning model and the opinion polarity marked by the corresponding stock review text, calculate the loss of the machine learning model, and learn the parameters of the machine learning model based on the calculated loss; that is, based on the training set , using the adaptive regularization stochastic gradient descent method to learn the FM model parameters, and using cross-validation to optimize and adjust the value of the hyperparameter k in the FM model, where the value of the hyperparameter k is a manually given value.

(6)基于验证集,评测FM模型效果,具体为:从验证集的股票评论文本中提取特征,将提取的特征输入到机器学习模型中,得到机器学习模型输出的股票评论文本的观点极性分类信息;基于机器学习模型的输出的观点极性分类信息和相应股票评论文本标注的观点极性,评测机器学习模型的效果。(6) Based on the validation set, evaluate the effect of the FM model, specifically: extracting features from the stock review text of the validation set, inputting the extracted features into the machine learning model, and obtaining the opinion polarity of the stock review text output by the machine learning model Classification information: Based on the opinion polarity classification information output by the machine learning model and the opinion polarity marked by the corresponding stock review text, evaluate the effect of the machine learning model.

(7)重复(5)、(6)和(7),直到FM效果满足要求(如准确率大于95%),则完成FM模型训练。(7) Repeat (5), (6) and (7) until the FM effect meets the requirements (eg, the accuracy rate is greater than 95%), then the FM model training is completed.

(8)基于训练好的FM模型,对股票评论文本进行观点极性分类,得到o(ci)属性。(8) Based on the trained FM model, the opinion polarity classification is performed on the stock review text, and the o(ci) attribute is obtained.

(9)根据(式1)计算每一条股票评论的可靠性r(ci):(9) Calculate the reliability r(ci) of each stock review according to (Equation 1):

其中,代表日期,的股票价格,第二天的股票价格,为0或1。in, represents the date, for stock price, for the stock price the next day, is 0 or 1.

(10)为股票评论文本生成相应的结构化数据,该结构化数据包括:股票评论员标识、评论时间、目标股票、观点极性和可靠性指数,即构建股票评论单元数据ci={d(ci),a(ci),s(ci),t(ci),o(ci),r(ci)},其中,d(ci)为评论内容,a(ci)为股票评论员标识,s(ci)为目标股票,t(ci)为评论时间,o(ci),为观点极性,r(ci)为可靠性指数。(10) Generate corresponding structured data for the stock review text, the structured data includes: stock reviewer identification, review time, target stock, opinion polarity and reliability index, that is, construct stock review unit data ci={d( ci),a(ci),s(ci),t(ci),o(ci),r(ci)}, where d(ci) is the comment content, a(ci) is the stock commentator identifier, s (ci) is the target stock, t(ci) is the comment time, o(ci) is the opinion polarity, and r(ci) is the reliability index.

六、股评信息可靠性打分方法,即对某一个股票评论员的某一条股评信息的可靠性打分。从股评序列、股价序列和股评员历史行为数据中提取关键特征,基于分类模型和时间序列分析模型的集成学习框架对股评信息的可靠性进行打分,具体包括:6. The method of scoring the reliability of stock review information, that is, to score the reliability of a certain stock review information of a certain stock reviewer. Extract key features from stock rating sequence, stock price sequence and stock reviewer historical behavior data, and score the reliability of stock rating information based on the integrated learning framework of classification model and time series analysis model, including:

(1)基于股票评论数据集和股价序列集提取特征向量,首先,基于股票评论数据集中的至少部分股票评论数据中的每一条股票评论数据,提取如下特征中的一种或多种组成一个特征向量:(1) Extract feature vectors based on the stock review data set and the stock price sequence set. First, based on each piece of stock review data in at least part of the stock review data in the stock review data set, extract one or more of the following features to form a feature vector:

该条股票评论数据的看涨或看跌的观点极性信息;关于如何确定该条股票评论数据的看涨或看跌的观点极性信息,在步骤五中已经作了详细阐述,在此不再赘述。The bullish or bearish opinion polarity information of the stock review data; how to determine the bullish or bearish opinion polarity information of the stock review data has been elaborated in step 5, and will not be repeated here.

在t当日发布的所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量;Among all the stock comment data for stock s released on the day t, the number of bullish stock comment data and the number of bearish stock comment data;

从t日起过去的第一预设长度时间内发布的,所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;The number of bullish stock reviews, the number of bearish stock reviews, the number of stock reviews with correct views, and the number of stock reviews with wrong views among all stock review data for stock s published within the first preset length of time from day t The number of stock review data;

从t日起过去的第二预设长度时间内的股票s的价格序列;The price sequence of the stock s in the second preset length of time since day t;

用于预测股价的机器学习模型预测的股票s在下一个交易日的价格以及该模型输出的标准差;The price of the stock s predicted by the machine learning model used to predict the stock price on the next trading day and the standard deviation of the output of the model;

从t日起过去的第三预设长度时间内,股票评论员a发布的所有股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the third preset length of time from day t, among all the stock comment data published by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, the number of stock comment data with correct view, and the number of incorrect view data The number of stock review data;

从t日起过去的第四预设长度时间内,股票评论员a发布的针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the fourth preset length of time since day t, among the stock comment data for stock s released by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, and the number of correct stock comment data and the number of misinformed stock review data;

基于股票评论员a的从t日起过去的第五预设长度时间内发布的股票评论序列确定的,基于股票评论员a的观点改变概率OSRatio、观点正确的前提下改变观点的概率TSRatio、观点错误的前提下改变观点的概率FSRatio、观点正确的前提下保持观点且保持的观点正确的概率TCTRatio、观点正确的前提下改变观点且改变的观点正确的概率TSTRatio、观点错误的前提下保持观点且保持的观点正确的概率FCTRatio以及观点错误的前提下改变观点且改变的观点正确的概率FSTRatio中的一种或多种;Determined based on the stock comment sequence published by stock commentator a in the fifth preset length of time from day t, based on the probability of change of opinion of stock commentator a, OSRatio, probability of change of opinion under the premise of correct opinion TSRatio, opinion Probability of changing your opinion under the wrong premise FSRatio, maintaining your opinion if you are right and keeping your opinion correct TCTRatio, changing your opinion if you are right and changing your opinion is correct TSTRatio, maintaining your opinion if you are wrong and maintaining your opinion One or more of the probability FCTRatio of maintaining the correct view and the probability FSTRatio of changing the view and changing the view if the view is wrong;

其中,该条股票评论数据的股票评论员为a,评论的是股票s,发布日期为t。Among them, the stock commentator of the stock comment data is a, the comment is the stock s, and the release date is t.

关于如何确定股票评论员a的观点极性分布信息,在步骤三中已经作了详细阐述,在此不再赘述。How to determine the opinion polarity distribution information of stock commentator a has been explained in detail in step 3, and will not be repeated here.

举例来说,从股评序列、股价序列和股评员历史行为数据中提取关键特征,该关键特征包括:观点极性、历史股票状态、价格时序和股票评论员历史行为。其中,观点极性为当前评论的看涨或看跌;历史股票状态包括两种情况:第一为不考虑时间,当日发布的所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量;第二为过去7天内的股票评论中所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;价格时序包括:过去25天内股票s的的价格序列以及用ARMA模型预测的第二天的价格和输出的标准差;股评员历史行为包括:某个股评员a在过去7/30/90天内作出的看涨/看跌/正确/错误的股票评论数目;某个股评员对当前股票在过去7/30/90天内作出的看涨/看跌/正确/错误的股票评论数量;基于某个股评员a在过去7/30/90天发布的股票评论序列确定的OSRatio、TSRatio、FSRatio、TCTRatio、TSTRatio中的一种或多种。For example, key features are extracted from stock review sequence, stock price sequence and stock commentator historical behavior data, the key features include: opinion polarity, historical stock status, price time series and stock commentator historical behavior. Among them, the opinion polarity is bullish or bearish of the current comment; the historical stock status includes two situations: the first is that regardless of time, among all the stock comment data for stock s released on that day, the number of bullish stock comment data, the number of bearish ones The number of stock review data; the second is the number of bullish stock review data, the number of bearish stock review data, the number of stock review data with correct views, and the number of stock review data with correct views and wrong views among all stock review data for stock s in the past 7 days. The number of stock comment data; the price time series includes: the price series of the stock s in the past 25 days and the standard deviation of the next day's price and output predicted by the ARMA model; the historical behavior of stock commentators includes: a stock commentator a in the past 7/ Number of bullish/bearish/correct/incorrect stock reviews made in 30/90 days; number of bullish/bearish/correct/incorrect stock reviews made by a stock analyst on the current stock in the past 7/30/90 days; based on a stock One or more of OSRatio, TSRatio, FSRatio, TCTRatio, TSTRatio as determined by commentator a's sequence of stock reviews posted in the past 7/30/90 days.

(2)利用所提取的特征向量训练基于径向基核函数(式2)的支持向量机SVM模型:(2) Use the extracted feature vector to train the support vector machine SVM model based on the radial basis kernel function (Equation 2):

令径向基核函数为: Let the radial basis kernel function be:

其中,x1和x2是两个特征向量,也可以成为变量;Y是径向基核函数的参数,一般设置为1除以特征的总数,例如10000个特征,那么r就设置为0.0001;φ(·)将原始特征映射到高维内核空间,以便于进行最优决策超平面(式3)的计算;Among them, x 1 and x 2 are two feature vectors, which can also become variables; Y is the parameter of the radial basis kernel function, which is generally set to 1 divided by the total number of features, such as 10,000 features, then r is set to 0.0001; φ( ) maps the original features to the high-dimensional kernel space, so as to facilitate the calculation of the optimal decision hyperplane (equation 3);

SVM模型为:The SVM model is:

SVM的原理是求解能够正确划分训练数据集并且几何间隔最大的分离超平面。输入是一些特征样本点,模型在学习一个超平面,这个超平面能够确定两个事情:1、所有数据点被完美地分成两类,第一类的输出是1(对应靠谱股评),第二类的输出是0(对应不靠谱股评)2、所有数据点离超平面距离越远越好。The principle of SVM is to solve the separating hyperplane that can correctly divide the training dataset and has the largest geometric interval. The input is some feature sample points, the model is learning a hyperplane, this hyperplane can determine two things: 1. All data points are perfectly divided into two categories, the output of the first category is 1 (corresponding to reliable stock reviews), the first The output of the second category is 0 (corresponding to unreliable stock reviews). 2. The farther away all data points are from the hyperplane, the better.

如果特征样本点在原来的空间中是线性不可分的(绝大多数情况下都是线性不可分),那么我们希望通过一种映射把他映射到高维空间里使问题变得线性可分,用到的映射就是核函数。If the feature sample point is linearly inseparable in the original space (in most cases, it is linearly inseparable), then we hope to map it to the high-dimensional space through a mapping to make the problem linearly separable, using The mapping of is the kernel function.

(3)通过优化(式4)计算参数ω和b:(3) Calculate the parameters ω and b by optimizing (Equation 4):

s.t.yiTφ(ci)+b)≥1-ξisty iT φ(ci )+b)≥1-ξ i ,

ξi≥0,i=1,…,N,(式4)ξ i ≥0, i=1,...,N, (Equation 4)

其中C是训练样本中噪声与简化超平面分类的权衡参数,yi是股票评论观点是否正确的标签。ω,b,ξ这三个参数都是需要模型训练学习得到的参数,其中ω和b是SVM模型在预测时要用到的两个参数;s.t.代表后面的是前面的约束条件,即后两行是第一行目标函数的约束条件。yi是目标函数的边界,这个边界要越大越好。where C is the trade-off parameter for noise vs. simplified hyperplane classification in the training samples, and yi is the label for whether the stock review opinion is correct. The three parameters ω, b, and ξ are all parameters that need to be learned by model training, where ω and b are the two parameters used by the SVM model in prediction; st represents the previous constraints, that is, the latter two parameters. Rows are the constraints of the objective function in the first row. y i is the boundary of the objective function, and the bigger the boundary, the better.

(4)利用股价序列集训练用于预测股价的机器学习模型,如ARMA模型,包括:(4) Use stock price sequence sets to train machine learning models for predicting stock prices, such as ARMA models, including:

a.确定训练集和测试集股票价格序列数据,输入数据为连续若干的股票收盘价,输出为后一天股票收盘价;即确定作为模型训练集和测试集的股票价格序列数据,其中训练集或测试集中的每一条数据包括:用于输入模型的连续若干天的股票收盘价,以及作为标签的后一天的股票收盘价;a. Determine the stock price sequence data of the training set and the test set, the input data is a number of consecutive stock closing prices, and the output is the stock closing price of the next day; that is, determine the stock price sequence data as the model training set and test set, among which the training set or Each piece of data in the test set includes: the stock closing price of several consecutive days used to input the model, and the stock closing price of the next day as a label;

b.基于训练集训练ARMA模型,并基于验证集验证模型的预测效果;即基于训练集、使用最大似然估计训练ARMA模型参数,基于BIC准则对参数p和q进行调优,基于训练好的ARMA模型,利用某股票的历时股价数据预测后一天的股价,基于验证集验证该预测效果。b. Train the ARMA model based on the training set, and verify the prediction effect of the model based on the validation set; that is, based on the training set, use the maximum likelihood estimation to train the ARMA model parameters, and adjust the parameters p and q based on the BIC criterion. The ARMA model uses the historical stock price data of a stock to predict the stock price of the next day, and verifies the prediction effect based on the validation set.

总的来说,基于时间序列分析模型的股价预测,利用股票历史价格序列,训练ARMA模型,基于训练好的ARMA模型预测股票后一天的价格。In general, the stock price prediction based on the time series analysis model uses the historical price series of the stock to train the ARMA model, and predicts the price of the stock one day later based on the trained ARMA model.

(5)集成SVM模型和用于预测股价的机器学习模型,得到用于评价股票评论可靠性的分类模型;即基于股价预测结果构建分类方程,如下式5:(5) Integrate the SVM model and the machine learning model for predicting stock prices to obtain a classification model for evaluating the reliability of stock reviews; that is, construct a classification equation based on the stock price prediction results, as shown in Equation 5 below:

其中,时间的股价,第二天股价的预测值,是股评观点情感极性,err(ci)是股价序列数据的标准差,即模型当前输出的股价预测值的误差或者说是置信度值。in, Yes time stock price, Yes The predicted value of the stock price for the next day, is the sentiment polarity of stock reviews, and err( ci ) is the standard deviation of the stock price sequence data, that is, the error or confidence value of the stock price prediction value currently output by the model.

(6)集成SVM模型和ARMA模型,得到最终的分类函数,如下式6:(6) Integrate the SVM model and the ARMA model to obtain the final classification function, as shown in Equation 6 below:

h(ci)为1时,表示股评可靠;h(ci)为-1时,表示股评不可靠。其中计算公式如下式7:When h(c i ) is 1, it indicates that the stock rating is reliable; when h( ci ) is -1, it indicates that the stock rating is unreliable. in The calculation formula is as follows:

式7中u∈[0,1],是SVM和ARMA模型预测结果的加权系数,通过实验确定u=0.59效果最好。In formula 7, u∈[0, 1] is the weighting coefficient of the prediction results of the SVM and ARMA models, and it is determined by experiments that u=0.59 has the best effect.

股评可靠性分类准确值可根据下式8计算得到:The accurate value of the stock rating reliability classification can be calculated according to the following formula 8:

当rυ(ci)越高时,对股评分类结果越可靠。(式8)是(式7)的输出结果的绝对值。The higher the (ci ), the more reliable the results of rating the stocks. (Equation 8) is the absolute value of the output result of (Equation 7).

七、股票涨或跌的概率计算,通过股评可靠性度量过程中提取的相关特征及度量结果,计算股票涨或跌的概率,包括:7. Calculation of the probability of a stock going up or down, calculating the probability of a stock going up or down through the relevant features and measurement results extracted in the process of measuring the reliability of the stock rating, including:

(1)根据下式9计算该支股票的涨跌概率cf(sj):(1) Calculate the rising and falling probability cf(s j ) of the stock according to the following formula 9:

其中,表示股票评论数据集中的股票评论数据数量,即所有股票评论数目的总和,ci表示一条股票评论数据,为该条股票评论数据的观点极性,为该条股票评论数据的可靠性指数,rυ(ci)为对该条股票评论数据进行可靠性分类的准确值。in, Represents the stock reviews dataset The number of stock review data in , that is, the sum of all stock reviews, c i represents a stock review data, Opinion polarity for the stock comment data, is the reliability index of the stock review data, and (ci ) is the exact value of the reliability classification of the stock review data.

(2)根据下式10预测股票涨跌:(2) Predict the rise and fall of the stock according to the following formula 10:

(3)根据下式11计算股票涨或跌的概率:(3) Calculate the probability of the stock going up or down according to Equation 11 below:

w(sj)=|cf(sj)|.(式11)w(s j )=|cf(s j )|.(Equation 11)

当cf(sj)≥0时,w(sj)的值越大,说明股票涨的概率较大,当cf(sj)<0时,w(sj)的值越大,说明股票跌的概率较大。When cf(s j ) ≥ 0, the larger the value of w(s j ), the greater the probability of the stock going up; when cf(s j ) < 0, the greater the value of w(s j ), indicating that the stock The probability of falling is high.

八、股评可靠性模型完成,当接收关于股票评论员的指定观点信息查询请求,即可输出与该查询请求对应的结果数据。8. After the stock review reliability model is completed, when a query request for information about a stock reviewer's designated opinion is received, the result data corresponding to the query request can be output.

九、基于股评可靠性模型度量的股票投资,基于股评数据可靠性模型筛选可靠股评,并依此进行投资,包括:9. Stock investment based on the reliability model of stock rating, screening reliable stock rating based on the reliability model of stock rating data, and investing accordingly, including:

(1)对股票池中所有股票计算股票涨或跌的概率w(sj),其中sj为单个股票;(1) Calculate the probability w(s j ) of the stock rising or falling for all stocks in the stock pool, where s j is a single stock;

(2)多种智能选股方法:(2) A variety of intelligent stock selection methods:

a.选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择平均加权的方式;即筛选涨幅指数最高的的K个股票作为投资建议,且投资权重选择平均加权的方式,即每只股票平均投资G/K元,其中G为总投资金额;a. Select the preset number of stocks with the highest probability of rising and rising for investment advice, and choose the average weighting method for investment weight; that is, select the K stocks with the highest rising index as investment recommendations, and select the average weighting for investment weight. method, that is, the average investment in each stock is G/K yuan, where G is the total investment amount;

b.选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择按照涨的概率加权的方式;即筛选涨幅指数最高的的K个股票作为投资建议,且投资权重选择按照涨幅指数加权的方式,即股票sj投资b. Select the preset number of stocks with the highest probability of rising and rising for investment advice, and the investment weight is selected in a way that is weighted according to the probability of rising; that is, the K stocks with the highest rising index are selected as investment recommendations, and the investment weight is selected According to the weighted method of increase index, that is, stock s j investment Yuan

c.从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择平均加权的方式;即每个版块中选出一只涨幅指数最高的股票作为投资建议,一共M(M=10)个板块(见下表1),且投资权重选择平均加权的方式,即每只股票投资G/M元。c. Select a stock with the highest probability of rising and rising from each stock sector, and choose an average weighting method for investment weight; that is, select a stock with the highest growth index in each sector as an investment recommendation, with a total of M ( M = 10) sectors (see Table 1 below), and the investment weights are equally weighted, that is, each stock invests G/M yuan.

Table 1:Sectors of stock symbolsTable 1: Sectors of stock symbols

表1为股票版块信息,Category代表版块名,#Covered Symbols代表版块中股票数目。Table 1 is the stock section information, Category represents the section name, and #Covered Symbols represents the number of stocks in the section.

d.从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择按照涨的概率加权的方式;即每个版块中选出一只涨幅指数最高的股票作为投资建议,一共M(M=10)个板块,且投资权重选择平均加权的方式,即每只股票sj投资 d. Select a stock with the highest probability of rising and rising from each stock sector, and the investment weight will be weighted according to the probability of rising; that is, select a stock with the highest rising index in each sector as an investment suggestion, There are a total of M (M=10) sectors, and the investment weights are chosen to be equally weighted, that is, each stock s j invests in

e.从每个股票板块中选取一支或多支涨且涨的概率最高的股票,在各板块之间选择平均加权方式,在选取的每个板块的股票之间择按照涨的概率加权的方式;即上述选股方法的组合,例如首先从每个版块中各选择Km个涨幅最高的股票,然后用平均加权或者按照涨幅指数加权的方式,对各个股票进行投资。其中对各个版块的总投资也可以按照平均加权或者按照涨幅指数加权的方式。e. Select one or more stocks with the highest probability of rising and rising from each stock sector, choose an average weighting method among the sectors, and select stocks in each sector that are weighted according to the probability of rising. method; that is, a combination of the above stock selection methods, for example, first select Km stocks with the highest increase from each section, and then use the average weighting or weighting according to the increase index to invest in each stock. The total investment in each section can also be weighted equally or weighted by the growth index.

图5为采用智能选股方法c选择股票后的盈利情况示意图,在2016年1月到2016年12月选择智能选股方法c进行模拟投资,每个交易日选取K个股票投资,盈利情况如图5所示,一共投资10000元,K=M,每只股票10000/M。Figure 5 is a schematic diagram of the profit situation after using the intelligent stock selection method c to select stocks. From January 2016 to December 2016, the intelligent stock selection method c was selected for simulated investment, and K stocks were selected for investment on each trading day. The profit situation is as follows As shown in Figure 5, the total investment is 10,000 yuan, K=M, and each stock is 10,000/M.

图6示出了根据本发明一个实施例的一种实现股票投资推荐的装置示意图,其中,该装置600包括:FIG. 6 shows a schematic diagram of an apparatus for implementing stock investment recommendation according to an embodiment of the present invention, wherein the apparatus 600 includes:

股票集合获取单元601,适于获取给定的股票集合;A stock set acquisition unit 601, adapted to acquire a given stock set;

涨跌概率计算单元602,适于对于所述股票集合中的每支股票计算涨跌概率;A rise and fall probability calculation unit 602, adapted to calculate a rise and fall probability for each stock in the stock set;

股票投资推荐单元603,适于根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议。The stock investment recommendation unit 603 is adapted to select one or more stocks for investment advice according to the rising and falling probability of each stock in the stock set.

在本发明的一个实施例中,股票投资推荐单元,适于按照如下方式中的一种或多种进行股票投资推荐:In an embodiment of the present invention, the stock investment recommendation unit is adapted to perform stock investment recommendation in one or more of the following ways:

选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择平均加权的方式;Select the preset number of stocks with the highest probability of rising and rising for investment advice, and choose the average weighting method for investment weight;

选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择按照涨的概率加权的方式;Select the preset number of stocks with the highest probability of rising and rising for investment advice, and the investment weight is selected according to the probability of rising;

从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择平均加权的方式;Select a stock with the highest probability of rising and rising from each stock sector, and choose an average weighting method for investment weight;

从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择按照涨的概率加权的方式;Select a stock with the highest probability of rising and rising from each stock sector, and the investment weight will be weighted according to the probability of rising;

从每个股票板块中选取一支或多支涨且涨的概率最高的股票,在各板块之间选择平均加权方式,在选取的每个板块的股票之间择按照涨的概率加权的方式。Select one or more stocks with the highest rising probability from each stock sector, choose an average weighting method among the sectors, and choose a weighting method according to the rising probability among the selected stocks in each sector.

图7示出了根据本发明一个实施例的另一种实现股票投资推荐的装置示意图,该装置70包括:股票集合获取单元601、涨跌概率计算单元602和股票投资推荐单元603。其中,涨跌概率计算单元602包括:7 shows a schematic diagram of another apparatus for implementing stock investment recommendation according to an embodiment of the present invention. The apparatus 70 includes: a stock set acquisition unit 601 , a rise and fall probability calculation unit 602 and a stock investment recommendation unit 603 . Wherein, the rise and fall probability calculation unit 602 includes:

获取单元701,适于获取针对同一支股票sj的股票评论数据集 The obtaining unit 701 is adapted to obtain a stock comment data set for the same stock s j

计算单元702,适于根据如下公式计算该支股票的涨跌概率cf(sj):The calculation unit 702 is adapted to calculate the rising and falling probability cf(s j ) of the stock according to the following formula:

其中,表示股票评论数据集中的股票评论数据数量,ci表示一条股票评论数据,为该条股票评论数据的观点极性,为该条股票评论数据的可靠性指数,rυ(ci)为对该条股票评论数据进行可靠性分类的准确值;in, Represents the stock reviews dataset The number of stock review data in , c i represents a piece of stock review data, Opinion polarity for the stock comment data, is the reliability index of the stock review data, (ci ) is the exact value of the reliability classification of the stock review data;

当cf(sj)≥0时,股票sj涨,且涨的概率是|cf(sj)|;When cf(s j )≥0, the stock s j goes up, and the probability of going up is |cf(s j )|;

当cf(sj)<0时,股票sj跌,且涨的概率是|cf(sj)|。When cf(s j )<0, the stock s j falls, and the probability of rising is |cf(s j )|.

图8示出了根据本发明一个实施例的又一种实现股票投资推荐的装置示意图,该装置80包括:股票集合获取单元601、涨跌概率计算单元602和股票投资推荐单元603。其中,涨跌概率计算单元602包括:获取单元701和计算单元702,其中,计算单元702包括观点极性预测单元801、股评可靠性确定单元802、股评可靠性分类单元803。8 shows a schematic diagram of another apparatus for implementing stock investment recommendation according to an embodiment of the present invention. The apparatus 80 includes: a stock set acquisition unit 601 , a rise and fall probability calculation unit 602 and a stock investment recommendation unit 603 . The rise and fall probability calculation unit 602 includes: an acquisition unit 701 and a calculation unit 702 , wherein the calculation unit 702 includes an opinion polarity prediction unit 801 , a stock rating reliability determination unit 802 , and a stock rating reliability classification unit 803 .

观点极性预测单元801适于根据如下方法确定一条股票评论数据的观点极性 The opinion polarity prediction unit 801 is adapted to determine the opinion polarity of a piece of stock review data according to the following method

获取由股票评论数据组成的训练集和验证集,并为训练集和验证集中的每条股票评论数据标注观点极性;Obtain a training set and a validation set consisting of stock review data, and label opinion polarity for each stock review data in the training set and validation set;

基于标注后的训练集,对机器学习模型进行训练,并基于标注后的测试集对学习模型的效果进行评测,得到训练后的机器学习模型;Based on the labeled training set, the machine learning model is trained, and the effect of the learning model is evaluated based on the labeled test set, and the trained machine learning model is obtained;

将待预测的股票评论数据的相关信息输入到训练后的机器学习模型,得到该机器学习模型输出的该股票评论数据的观点极性分类信息,并根据该观点极性分类信息确定该股票评论数据的观点极性Input the relevant information of the stock review data to be predicted into the trained machine learning model, obtain the opinion polarity classification information of the stock review data output by the machine learning model, and determine the stock review data according to the opinion polarity classification information polarity of opinion

股评可靠性确定单元802,适于根据如下公式确定一条股票评论数据的可靠性指数 The stock review reliability determination unit 802 is adapted to determine the reliability index of a piece of stock review data according to the following formula

其中,代表日期,的股票价格,后一天的股票价格,是股票评论观点。in, represents the date, Yes stock price, Yes the stock price the next day, is a stock commentary opinion.

股评可靠性分类单元803,适于根据如下方式确定对一条股票评论数据进行可靠性分类的准确值rυ(ci):The stock review reliability classification unit 803 is adapted to determine the exact value (ci ) for reliability classification of a piece of stock review data according to the following method:

基于股票评论数据集和股价序列集提取特征向量;Extract feature vector based on stock review dataset and stock price sequence set;

利用所提取的特征向量训练基于径向基核函数的支持向量机SVM模型;Use the extracted feature vector to train the support vector machine SVM model based on radial basis kernel function;

利用股价序列集训练用于预测股价的机器学习模型;Train a machine learning model for predicting stock prices using stock price sequence sets;

集成SVM模型和用于预测股价的机器学习模型,得到用于评价股票评论可靠性的分类模型 Integrate the SVM model and the machine learning model for predicting stock prices to obtain a classification model for evaluating the reliability of stock reviews

rυ(ci)的值越大,表示对股票评论可靠性的分类结果越可靠。but The larger the value of (ci ), the more reliable the classification result of the reliability of stock reviews.

股评可靠性分类单元803,适于基于股票评论数据集中的至少部分股票评论数据中的每一条股票评论数据,提取如下特征中的一种或多种组成一个特征向量:The stock review reliability classification unit 803 is adapted to extract one or more of the following features to form a feature vector based on each piece of stock review data in at least part of the stock review data in the stock review data set:

该条股票评论数据的看涨或看跌的观点极性信息;The bullish or bearish view polarity information of the stock comment data;

在t当日发布的所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量;Among all the stock comment data for stock s released on the day t, the number of bullish stock comment data and the number of bearish stock comment data;

从t日起过去的第一预设长度时间内发布的,所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;The number of bullish stock reviews, the number of bearish stock reviews, the number of stock reviews with correct views, and the number of stock reviews with wrong views among all stock review data for stock s published within the first preset length of time from day t The number of stock review data;

从t日起过去的第二预设长度时间内的股票s的价格序列;The price sequence of the stock s in the second preset length of time since day t;

用于预测股价的机器学习模型预测的股票s在下一个交易日的价格以及该模型输出的标准差;The price of the stock s predicted by the machine learning model used to predict the stock price on the next trading day and the standard deviation of the output of the model;

从t日起过去的第三预设长度时间内,股票评论员a发布的所有股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the third preset length of time from day t, among all the stock comment data published by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, the number of stock comment data with correct view, and the number of incorrect view data The number of stock review data;

从t日起过去的第四预设长度时间内,股票评论员a发布的针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the fourth preset length of time since day t, among the stock comment data for stock s released by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, and the number of correct stock comment data and the number of misinformed stock review data;

基于股票评论员a的从t日起过去的第五预设长度时间内发布的股票评论序列确定的,基于股票评论员a的观点改变概率OSRatio、观点正确的前提下改变观点的概率TSRatio、观点错误的前提下改变观点的概率FSRatio、观点正确的前提下保持观点且保持的观点正确的概率TCTRatio、观点正确的前提下改变观点且改变的观点正确的概率TSTRatio、观点错误的前提下保持观点且保持的观点正确的概率FCTRatio以及观点错误的前提下改变观点且改变的观点正确的概率FSTRatio中的一种或多种;Determined based on the stock comment sequence published by stock commentator a in the fifth preset length of time from day t, based on the probability of change of opinion of stock commentator a, OSRatio, probability of change of opinion under the premise of correct opinion TSRatio, opinion Probability of changing your opinion under the wrong premise FSRatio, maintaining your opinion if you are right and keeping your opinion correct TCTRatio, changing your opinion if you are right and changing your opinion is correct TSTRatio, maintaining your opinion if you are wrong and maintaining your opinion One or more of the probability FCTRatio of maintaining the correct view and the probability FSTRatio of changing the view and changing the view if the view is wrong;

其中,该条股票评论数据的股票评论员为a,评论的是股票s,发布日期为t。Among them, the stock commentator of the stock comment data is a, the comment is the stock s, and the release date is t.

股评可靠性分类单元803利用所提取的特征向量训练基于径向基核函数的SVM模型具体为:The stock evaluation reliability classification unit 803 uses the extracted feature vector to train the SVM model based on the radial basis kernel function, specifically:

令径向基核函数为: Let the radial basis kernel function be:

SVM模型为: The SVM model is:

其中,x1和x2是两个特征向量,γ是径向基核函数的参数;函数φ(·)将原始特征映射到高维内核空间,以便进行最优决策超平面的计算;Among them, x 1 and x 2 are two eigenvectors, and γ is the parameter of the radial basis kernel function; the function φ( ) maps the original feature to the high-dimensional kernel space for the calculation of the optimal decision hyperplane;

通过优化如下的目标函数来计算SVM模型的参数ω和b:The parameters ω and b of the SVM model are calculated by optimizing the objective function as follows:

s.t.yiTφ(ci)+b)≥1-ξisty iT φ(ci )+b)≥1-ξ i ,

ξi≥0,i=1,…,N,ξ i ≥ 0, i=1,...,N,

其中,C是训练样本中噪声与简化超平面分类的权衡参数,yi是股票评论观点是否正确的标签。where C is the trade-off parameter for noise versus simplified hyperplane classification in the training samples, and yi is the label for whether the stock review opinion is correct.

股评可靠性分类单元803利用股价序列集训练用于预测股价的机器学习模型具体为:The stock evaluation reliability classification unit 803 uses the stock price sequence set to train the machine learning model for predicting stock prices as follows:

确定作为模型训练集和测试集的股票价格序列数据,其中训练集或测试集中的每一条数据包括:用于输入模型的连续若干天的股票收盘价,以及作为标签的后一天的股票收盘价;Determine the stock price sequence data as the model training set and test set, wherein each piece of data in the training set or test set includes: the stock closing price of several consecutive days used to input the model, and the stock closing price of the next day as a label;

基于训练集训练ARMA模型,并基于验证集验证模型的预测效果Train the ARMA model based on the training set and verify the prediction effect of the model based on the validation set

股评可靠性分类单元803集成SVM模型和用于预测股价的机器学习模型,得到用于评价股票评论可靠性的分类模型具体为:The stock review reliability classification unit 803 integrates the SVM model and the machine learning model for predicting stock prices, and obtains a classification model for evaluating the reliability of stock reviews as follows:

基于用于预测股价的机器学习模型的股价预测结果,构建如下的分类方程:Based on the stock price prediction results of the machine learning model used to predict stock prices, the following classification equation is constructed:

其中,时间的股价,是用于预测股价的机器学习模型预测的后一天的股票价格,是股票评论观点极性,err(ci)是用于预测股价的机器学习模型输出的当前股票预测价格的标准差;in, Yes time stock price, is predicted by a machine learning model used to predict stock prices the stock price the next day, is the stock review opinion polarity, err( ci ) is the standard deviation of the current stock forecast price output by the machine learning model used to predict stock prices;

集成SVM模型和用于预测股价的机器学习模型:其中,u∈[0,1];Integrate an SVM model and a machine learning model for predicting stock prices: where u∈[0,1];

最终的用于评价股票评论可靠性的分类模型为:The final classification model for evaluating the reliability of stock reviews is:

其中,h(ci)为1时,表示股评可靠;h(ci)为-1时,表示股评不可靠Among them, when h(c i ) is 1, it indicates that the stock rating is reliable; when h( ci ) is -1, it indicates that the stock rating is unreliable

股评可靠性分类单元803,适于通过如下方式确定该条股票评论数据的看涨或看跌的观点极性信息:The stock review reliability classification unit 803 is adapted to determine the bullish or bearish opinion polarity information of the piece of stock review data in the following manner:

获取由股票评论数据组成的训练集和验证集,并为训练集和验证集中的每条股票评论数据标注观点极性;Obtain a training set and a validation set consisting of stock review data, and label opinion polarity for each stock review data in the training set and validation set;

基于标注后的训练集,对机器学习模型进行训练,并基于标注后的测试集对学习模型的效果进行评测,得到训练后的用于预测股票评论数据观点极性的机器学习模型;Based on the labeled training set, the machine learning model is trained, and the effect of the learning model is evaluated based on the labeled test set, and the trained machine learning model for predicting the opinion polarity of the stock review data is obtained;

将该条股票评论数据输入到用于预测股票评论数据观点极性的机器学习模型,得到该模型输出的该股票评论数据的观点极性分类信息。The stock review data is input into a machine learning model for predicting the opinion polarity of the stock review data, and the opinion polarity classification information of the stock review data output by the model is obtained.

股评可靠性分类单元803,适于基于如下方法确定股票评论员a的观点极性分布信息:The stock review reliability classification unit 803 is adapted to determine the opinion polarity distribution information of stock reviewer a based on the following method:

基于股票评论员a对同一股票的股票评论序列中的各相邻股票评论数据,提取股评数据对;Based on each adjacent stock comment data in the stock comment sequence of the same stock by stock commentator a, the stock comment data pair is extracted;

基于提取的股评数据对,确定该股票评论员a的观点改变概率OSRatio、观点正确的前提下改变观点的概率TSRatio、观点错误的前提下改变观点的概率FSRatio、观点正确的前提下保持观点且保持的观点正确的概率TCTRatio、观点正确的前提下改变观点且改变的观点正确的概率TSTRatio、观点错误的前提下保持观点且保持的观点正确的概率FCTRatio以及观点错误的前提下改变观点且改变的观点正确的概率FSTRatio。Based on the extracted stock review data pairs, determine the probability of change of opinion of the stock commentator a, OSRatio, probability of change of opinion under the premise of correct opinion, probability of change of opinion under the premise of correct opinion, probability of change of opinion under the premise of wrong opinion FSRatio, probability of change of opinion under the premise of correct opinion, and maintenance of opinion under the premise of correct opinion. Probability of correct opinion TCTRatio, Probability of changing opinion under the premise of correct opinion and changing opinion is correct TSTRatio, Probability of maintaining opinion under the premise of wrong opinion and remaining correct opinion and Probability of changing opinion under the premise of wrong opinion and changing opinion Correct probability FSTRatio.

综上所述,通过获取给定的股票集合;对于股票集合中的每支股票计算涨跌概率;根据股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议。本发明对现有的机器学习模型进行特殊处理及训练,对股票评论数据进行观点极性分类预测,使得待预测的股票评论文本的相关信息输入到训练后的机器学习模型后即可得到该机器学习模型输出的该股票评论文本的观点极性分类信息,计算股票涨跌概率,根据股票涨跌概率来进行股票投资,方便快捷,准确度高,并且融合了多种异构信息源,例如股票价格时序、股票评论文本内容以及发表股票评论的股票评论员的历史行为,基于该多源异构大数据,借助数据挖掘技术深入分析并提取关键特征,利用这些特征进行股票评论可靠性度量,可以有效过滤噪声,从海量信息中筛选出有价值、可靠的股票评论信息,精选出优质股票,能够帮助投资者更加准确地理解市场走势以及股票动态,供投资者或股市分析员使用。该方法不仅可应用于股票评论信息可靠性分析,还可应用于金融领域其他方面,如经济形势分析、股票精准推荐、投资组合管理和自动交易等。To sum up, by obtaining a given stock set; calculating the probability of ups and downs for each stock in the stock set; and selecting one or more stocks for investment advice according to the ups and downs of each stock in the stock set. The present invention performs special processing and training on the existing machine learning model, and performs opinion polarity classification prediction on the stock review data, so that the relevant information of the stock review text to be predicted can be input into the trained machine learning model, and then the machine can be obtained. The opinion polarity classification information of the stock review text output by the learning model, calculate the probability of stock price rise and fall, and make stock investment according to the probability of stock price change, which is convenient, fast, and highly accurate, and integrates a variety of heterogeneous information sources, such as stocks The price sequence, the text content of stock reviews, and the historical behavior of stock reviewers who published stock reviews, based on the multi-source heterogeneous big data, with the help of data mining technology to deeply analyze and extract key features, and use these features to measure the reliability of stock reviews. Effectively filter noise, screen out valuable and reliable stock review information from massive information, and select high-quality stocks, which can help investors more accurately understand market trends and stock dynamics for investors or stock market analysts. This method can be applied not only to the reliability analysis of stock review information, but also to other aspects of the financial field, such as economic situation analysis, accurate stock recommendation, portfolio management and automatic trading.

需要说明的是:It should be noted:

在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual appliance, or other device. Various general-purpose devices can also be used with the teachings based on this. The structure required to construct such a device is apparent from the above description. Furthermore, the present invention is not directed to any particular programming language. It should be understood that various programming languages may be used to implement the inventions described herein, and that the descriptions of specific languages above are intended to disclose the best mode for carrying out the invention.

在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it is to be understood that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together into a single embodiment, figure, or its description. This disclosure, however, should not be construed as reflecting an intention that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的替代特征来代替。Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will appreciate that although some of the embodiments described herein include certain features, but not others, included in other embodiments, that combinations of features of different embodiments are intended to be within the scope of the invention within and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的文字内容的拍照录入装置、电子设备和计算机可读存储介质中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some of the apparatus for photographing and recording text content, the electronic device, and the computer-readable storage medium according to the embodiments of the present invention Or some or all of the functionality of all components. The present invention can also be implemented as apparatus or apparatus programs (eg, computer programs and computer program products) for performing part or all of the methods described herein. Such a program implementing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.

例如,图9是本发明实施例中的电子设备的结构示意图。该电子设备900包括:处理器910,以及存储有可在所述处理器910上运行的计算机程序的存储器920。处理器910,用于在执行所述存储器920中的计算机程序时执行本发明中方法的各步骤。存储器920可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器920具有存储用于执行上述方法中的任何方法步骤的计算机程序931的存储空间930。计算机程序931可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图10所述的计算机可读存储介质。For example, FIG. 9 is a schematic structural diagram of an electronic device in an embodiment of the present invention. The electronic device 900 includes a processor 910 , and a memory 920 storing a computer program executable on the processor 910 . The processor 910 is configured to execute each step of the method in the present invention when executing the computer program in the memory 920 . The memory 920 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. The memory 920 has storage space 930 for storing a computer program 931 for performing any of the method steps in the above-described methods. The computer program 931 can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a computer-readable storage medium as described in FIG. 10 .

图10是本发明实施例中的一种计算机可读存储介质的结构示意图。该计算机可读存储介质1000存储有用于执行根据本发明的方法步骤的计算机程序931,可以被电子设备900的处理器910读取,当计算机程序931由电子设备900运行时,导致该电子设备900执行上面所描述的方法中的各个步骤,具体来说,该计算机可读存储介质存储的计算程序931可以执行上述任一实施例中示出的方法。计算机程序931可以以适当形式进行压缩。FIG. 10 is a schematic structural diagram of a computer-readable storage medium in an embodiment of the present invention. The computer-readable storage medium 1000 stores a computer program 931 for carrying out the method steps according to the invention, which can be read by the processor 910 of the electronic device 900 and, when executed by the electronic device 900, causes the electronic device 900 to To execute each step in the above-described method, specifically, the computing program 931 stored in the computer-readable storage medium can execute the method shown in any of the above-described embodiments. The computer program 931 may be compressed in a suitable form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-described embodiments illustrate rather than limit the invention, and that alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

Claims (10)

1.一种实现股票投资推荐的方法,其中,该方法包括:1. A method for implementing stock investment recommendation, wherein the method comprises: 获取给定的股票集合;Get the given stock collection; 对于所述股票集合中的每支股票计算涨跌概率;Calculate the probability of ups and downs for each stock in the stock set; 根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议。According to the rising and falling probability of each stock in the stock set, one or more stocks are selected for investment advice. 2.如权利要求1所述的方法,其中,所述根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议包括如下中的一种或多种:2. The method according to claim 1, wherein, according to the rising and falling probability of each stock in the stock set, selecting one or more stocks for investment advice includes one or more of the following: 选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择平均加权的方式;Select the preset number of stocks with the highest probability of rising and rising for investment advice, and choose the average weighting method for investment weight; 选取涨且涨的概率最高的预设个数的股票进行投资建议,且投资权重选择按照涨的概率加权的方式;Select the preset number of stocks with the highest probability of rising and rising for investment advice, and the investment weight is selected according to the probability of rising; 从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择平均加权的方式;Select a stock with the highest probability of rising and rising from each stock sector, and choose an average weighting method for investment weight; 从每个股票板块中选取一支涨且涨的概率最高的股票,且投资权重选择按照涨的概率加权的方式;Select a stock with the highest probability of rising and rising from each stock sector, and the investment weight will be weighted according to the probability of rising; 从每个股票板块中选取一支或多支涨且涨的概率最高的股票,在各板块之间选择平均加权方式,在选取的每个板块的股票之间择按照涨的概率加权的方式。Select one or more stocks with the highest rising probability from each stock sector, choose an average weighting method among the sectors, and choose a weighting method according to the rising probability among the selected stocks in each sector. 3.如权利要求1所述的方法,其中,所述对于所述股票集合中的每支股票计算涨跌概率包括:3. The method of claim 1, wherein the calculating a probability of ups and downs for each stock in the stock set comprises: 获取针对同一支股票sj的股票评论数据集 Get a dataset of stock reviews for the same stock s j 根据如下公式计算该支股票的涨跌概率cf(sj):Calculate the probability of the stock going up or down cf(s j ) according to the following formula: 其中,表示股票评论数据集中的股票评论数据数量,si表示一条股票评论数据,为该条股票评论数据的观点极性,为该条股票评论数据的可靠性指数,rυ(ci)为对该条股票评论数据进行可靠性分类的准确值;in, Represents the stock reviews dataset The number of stock review data in , si represents a piece of stock review data, Opinion polarity for the stock comment data, is the reliability index of the stock review data, (ci ) is the exact value of the reliability classification of the stock review data; 当cf(sj)≥0时,股票sj涨,且涨的概率是|cf(sj)|;When cf(s j )≥0, the stock s j goes up, and the probability of going up is |cf(s j )|; 当cf(sj)<0时,股票sj跌,且跌的概率是|cf(sj)|。When cf(s j )<0, the stock s j falls, and the probability of falling is |cf(s j )|. 4.如权利要求3所述的方法,其中,根据如下方法确定一条股票评论数据的观点极性 4. The method of claim 3, wherein the opinion polarity of a piece of stock review data is determined according to the following method 获取由股票评论数据组成的训练集和验证集,并为训练集和验证集中的每条股票评论数据标注观点极性;Obtain a training set and a validation set consisting of stock review data, and label opinion polarity for each stock review data in the training set and validation set; 基于标注后的训练集,对机器学习模型进行训练,并基于标注后的测试集对所述学习模型的效果进行评测,得到训练后的机器学习模型;Based on the marked training set, the machine learning model is trained, and the effect of the learning model is evaluated based on the marked test set to obtain the trained machine learning model; 将待预测的股票评论数据的相关信息输入到所述训练后的机器学习模型,得到该机器学习模型输出的该股票评论数据的观点极性分类信息,并根据该观点极性分类信息确定该股票评论数据的观点极性。Input the relevant information of the stock review data to be predicted into the trained machine learning model, obtain the opinion polarity classification information of the stock review data output by the machine learning model, and determine the stock according to the opinion polarity classification information Opinion polarity of the review data. 5.如权利要求3所述的方法,其中,根据如下公式确定一条股票评论数据的可靠性指数 5. The method of claim 3, wherein the reliability index of a piece of stock review data is determined according to the following formula 其中,代表日期,的股票价格,后一天的股票价格,是股票评论观点极性。in, represents the date, Yes stock price, Yes the stock price the next day, It is the polarity of stock comments. 6.如权利要求3所述的方法,其中,根据如下方式确定对一条股票评论数据进行可靠性分类的准确值rυ(ci):6. The method of claim 3, wherein the exact value (ci) for reliability classification of a piece of stock review data is determined as follows: 基于股票评论数据集和股价序列集提取特征向量;Extract feature vector based on stock review dataset and stock price sequence set; 利用所提取的特征向量训练基于径向基核函数的支持向量机SVM模型;Use the extracted feature vector to train the support vector machine SVM model based on radial basis kernel function; 利用股价序列集训练用于预测股价的机器学习模型;Train a machine learning model for predicting stock prices using stock price sequence sets; 集成所述SVM模型和用于预测股价的机器学习模型,得到用于评价股票评论可靠性的分类模型 Integrate the SVM model and the machine learning model for predicting stock prices to obtain a classification model for evaluating the reliability of stock reviews rυ(ci)的值越大,表示对股票评论可靠性的分类结果越可靠。but The larger the value of (ci ), the more reliable the classification result of the reliability of stock reviews. 7.如权利要求6所述的方法,其中,所述基于股票评论数据集和股价序列集提取特征向量包括:7. The method of claim 6, wherein the extracting feature vectors based on the stock review data set and the stock price sequence set comprises: 基于所述股票评论数据集中的至少部分股票评论数据中的每一条股票评论数据,提取如下特征中的一种或多种组成一个特征向量:Based on each piece of stock review data in at least part of the stock review data in the stock review data set, one or more of the following features are extracted to form a feature vector: 该条股票评论数据的看涨或看跌的观点极性信息;The bullish or bearish view polarity information of the stock comment data; 在t当日发布的所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量;Among all the stock comment data for stock s released on the day t, the number of bullish stock comment data and the number of bearish stock comment data; 从t日起过去的第一预设长度时间内发布的,所有针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;The number of bullish stock reviews, the number of bearish stock reviews, the number of stock reviews with correct views, and the number of stock reviews with wrong views among all stock review data for stock s published within the first preset length of time from day t The number of stock review data; 从t日起过去的第二预设长度时间内的股票s的价格序列;The price sequence of the stock s in the second preset length of time since day t; 所述用于预测股价的机器学习模型预测的股票s在下一个交易日的价格以及该模型输出的标准差;The price of the stock s predicted by the machine learning model for predicting stock prices on the next trading day and the standard deviation of the model output; 从t日起过去的第三预设长度时间内,股票评论员a发布的所有股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the third preset length of time from day t, among all the stock comment data published by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, the number of stock comment data with correct view, and the number of incorrect view data The number of stock review data; 从t日起过去的第四预设长度时间内,股票评论员a发布的针对股票s的股票评论数据中,看涨的股票评论数据数量、看跌的股票评论数据数量、观点正确的股票评论数据数量和观点错误的股票评论数据数量;From the fourth preset length of time since day t, among the stock comment data for stock s released by stock commentator a, the number of bullish stock comment data, the number of bearish stock comment data, and the number of correct stock comment data and the number of misinformed stock review data; 基于股票评论员a的从t日起过去的第五预设长度时间内发布的股票评论序列确定的,基于股票评论员a的观点改变概率OSRatio、观点正确的前提下改变观点的概率TSRatio、观点错误的前提下改变观点的概率FSRatio、观点正确的前提下保持观点且保持的观点正确的概率TCTRatio、观点正确的前提下改变观点且改变的观点正确的概率TSTRatio、观点错误的前提下保持观点且保持的观点正确的概率FCTRatio以及观点错误的前提下改变观点且改变的观点正确的概率FSTRatio中的一种或多种;Determined based on the stock comment sequence published by stock commentator a in the fifth preset length of time from day t, based on the probability of change of opinion of stock commentator a, OSRatio, probability of change of opinion under the premise of correct opinion TSRatio, opinion Probability of changing your opinion under the wrong premise FSRatio, maintaining your opinion if you are right and keeping your opinion correct TCTRatio, changing your opinion if you are right and changing your opinion is correct TSTRatio, maintaining your opinion if you are wrong and maintaining your opinion One or more of the probability FCTRatio of maintaining the correct view and the probability FSTRatio of changing the view and changing the view if the view is wrong; 其中,该条股票评论数据的股票评论员为a,评论的是股票s,发布日期为t。Among them, the stock commentator of the stock comment data is a, the comment is the stock s, and the release date is t. 8.一种实现股票投资推荐的装置,其中,该装置包括:8. A device for implementing stock investment recommendation, wherein the device comprises: 股票集合获取单元,适于获取给定的股票集合;A stock set acquisition unit, adapted to acquire a given stock set; 涨跌概率计算单元,适于对于所述股票集合中的每支股票计算涨跌概率;a rise and fall probability calculation unit, adapted to calculate the rise and fall probability for each stock in the stock set; 股票投资推荐单元,适于根据所述股票集合中的各支股票的涨跌概率,选取一支或多支股票进行投资建议。The stock investment recommendation unit is adapted to select one or more stocks for investment advice according to the rising and falling probability of each stock in the stock set. 9.一种电子设备,其特征在于,所述电子设备包括:处理器,以及存储有可在处理器上运行的计算机程序的存储器;9. An electronic device, characterized in that the electronic device comprises: a processor, and a memory storing a computer program executable on the processor; 其中,所述处理器,用于在执行所述存储器中的计算机程序时执行权利要求1-7中任一项所述的方法。Wherein, the processor is configured to execute the method of any one of claims 1-7 when executing the computer program in the memory. 10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1-7中任一项所述的方法。10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method of any one of claims 1-7 is implemented.
CN201810942583.8A 2018-08-17 2018-08-17 Method and device for realizing stock investment recommendation Pending CN109300030A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810942583.8A CN109300030A (en) 2018-08-17 2018-08-17 Method and device for realizing stock investment recommendation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810942583.8A CN109300030A (en) 2018-08-17 2018-08-17 Method and device for realizing stock investment recommendation

Publications (1)

Publication Number Publication Date
CN109300030A true CN109300030A (en) 2019-02-01

Family

ID=65165219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810942583.8A Pending CN109300030A (en) 2018-08-17 2018-08-17 Method and device for realizing stock investment recommendation

Country Status (1)

Country Link
CN (1) CN109300030A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503550A (en) * 2019-07-23 2019-11-26 周奕 A kind of stock certificate data analysis system
WO2021103571A1 (en) * 2019-11-25 2021-06-03 华泰证券股份有限公司 Method and apapratus for generating asset investment suggestion information and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503550A (en) * 2019-07-23 2019-11-26 周奕 A kind of stock certificate data analysis system
WO2021103571A1 (en) * 2019-11-25 2021-06-03 华泰证券股份有限公司 Method and apapratus for generating asset investment suggestion information and readable storage medium

Similar Documents

Publication Publication Date Title
Swathi et al. RETRACTED ARTICLE: An optimal deep learning-based LSTM for stock price prediction using twitter sentiment analysis: An optimal deep learning-based LSTM for stock price prediction using twitter sentiment analysis
CN109299252A (en) Method and Apparatus for Opinion Polarity Classification of Stock Reviews Based on Machine Learning
US20160085754A1 (en) Robust selection of candidates
CN108846097B (en) User&#39;s interest tag representation method, article recommendation method, and device and equipment
CN107220217A (en) Characteristic coefficient training method and device that logic-based is returned
CN105183833A (en) User model based microblogging text recommendation method and recommendation apparatus thereof
CN105022825A (en) Financial variety price prediction method capable of combining financial news mining and financial historical data
Velmurugan et al. Developing a fidelity evaluation approach for interpretable machine learning
CN109035025A (en) The method and apparatus for evaluating stock comment reliability
CN106227756A (en) A kind of stock index forecasting method based on emotional semantic classification and system
CN110162597B (en) Article data processing method and device, computer readable medium and electronic equipment
CN114139634A (en) Multi-label feature selection method based on paired label weights
CN119378494B (en) Entity relation extraction method and system for knowledge graph construction in financial field
Gil-Gonzalez et al. Learning from multiple annotators using kernel alignment
CN116911295A (en) A text classification method and system based on cross-debiased hyperparameter optimization
Putra et al. Optimizing sentiment analysis on imbalanced hotel review data using smote and ensemble machine learning techniques
Özkan et al. Effect of data preprocessing on ensemble learning for classification in disease diagnosis
Camacho et al. A new approach to dating the reference cycle
JP5933863B1 (en) Data analysis system, control method, control program, and recording medium
CN115794898B (en) Financial information recommendation method and device, electronic equipment and storage medium
CN112836754A (en) A method for evaluating the generalization ability of image-oriented description models
Addepalli et al. A proposed framework for measuring customer satisfaction and product recommendation for ecommerce
CN109300030A (en) Method and device for realizing stock investment recommendation
CN109300031A (en) Data mining method and device based on stock review data
Hassler et al. A comparison of automated training-by-example selection algorithms for Evidence Based Software Engineering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201