Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture to which the information acquisition method or apparatus of the present application can be applied.
As shown in fig. 1, the system architecture may include a server 101, a network 102, and a server 103. Network 102 is used to provide the medium of a transmission link between server 101 and server 103. Server 103 may be a server that provides network resources such as financial news. The server 101 may employ a web crawler to obtain news of web resources on the server 103, such as companies that issue bonds.
Referring to fig. 2, a flow chart of an embodiment of an information acquisition method according to the present application is shown. The method may be performed by a server, such as server 101 in fig. 1, and accordingly, the apparatus may be provided in a server, such as server 101 in fig. 1. The method comprises the following steps:
step 201, obtaining information of an entity object corresponding to the financial object, and extracting a keyword in the information.
In this embodiment, to predict whether the financial object will have the predetermined financial event, information of the physical object corresponding to the financial object may be first obtained. For example, the financial object is a bond, a default financial event is a non-default financial event, the entity object corresponding to the financial object is a company issuing the bond, and the information is news of the company issuing the bond, and the news of the company issuing the bond can be obtained first, in order to predict whether the bond will have the non-default event.
In this embodiment, after the information is acquired, the keywords in the information may be extracted, and the keywords may be associated with the operation status of the physical object corresponding to the financial object. Taking an example in which the financial object is a bond and the entity object is a company that issues the bond, the news of the company that issues the bond includes words related to the operating status of the company. For example, if news that relates to the progress of the investment project of the company describes that one project of the company is progressing slowly, keywords such as the title of the project, the progress, and the slowness can be extracted.
Step 202, inputting the keywords into a preset logistic regression model to obtain an output result.
In this embodiment, after the keywords are extracted in step 201, a preset logistic regression model may be used to predict whether the financial object will have a preset financial event based on the extracted keywords. For example, the financial object is a bond, the preset financial event is a default event, and whether the default event occurs to the bond can be predicted according to the extracted keywords.
By taking a financial object as a bond and a preset financial event as a default event as an example, the characteristic information of a plurality of bonds can be acquired in advance, and the characteristic information of the bonds comprises: the system comprises annotation information indicating whether the bond has a default event or not, and a preset financial event keyword in an entity object corresponding to the bond, such as news of a company issuing the bond. The logistic regression model can be trained by utilizing the characteristic information of a plurality of bonds in advance to obtain a preset logistic regression model. After training, the pre-configured logistic regression model may determine a weight, i.e., a regression coefficient, for each of the pre-configured financial event keywords. The weight of each preset financial event keyword indicates the importance degree of the preset financial event keyword for judging whether the bond will have default events.
In some optional implementation manners of this embodiment, information of an entity object corresponding to a financial object in which a preset financial event occurs may be obtained in advance; dividing the information into a plurality of information sentences, and segmenting the information sentences to obtain a plurality of words; and carrying out cluster analysis on the plurality of words to obtain preset financial event keywords.
Taking a financial object as a bond and a preset financial event as a default event as an example, news of a company of the bond, in which the default event occurs within a certain period of time, for example, within three years, can be obtained in advance, the news is divided into a plurality of sentences, and after the sentences are segmented, a plurality of words can be obtained. Clustering analysis can be performed on the plurality of words to obtain preset financial event keywords associated with the default event.
In this embodiment, after the keywords extracted in step 201 are input into the preset logistic regression model, the preset logistic regression model obtains an output result according to the weight of the preset financial event keywords matched with the extracted keywords, that is, the regression coefficient. The output of the pre-set logistic regression model may be indicative of a probability that the financial object will have the pre-set financial event.
In some optional implementations of this embodiment, the preset logistic regression model may be pre-constructed in the following manner: the logistic regression model may be first constructed, the feature information of the plurality of financial objects is acquired, and the plurality of feature information is divided into feature information for training and feature information for verification. The plurality of financial objects comprise financial objects which have a preset financial event and financial objects which have not a preset financial event. The characteristic information of the financial object which has undergone the preset financial event comprises annotation information indicating that the financial object has undergone the preset financial event and a preset financial event keyword in the information of the entity object corresponding to the financial object. The characteristic information of the financial object without the preset financial event comprises annotation information indicating that the financial object has not the preset financial event and a preset financial event keyword in the information of the entity object corresponding to the financial object.
When the logistic regression model is trained using the feature information of the financial object used for training, the label information in the feature information may be used as a numerical value of the dependent variable, for example, the label information in the feature information of the financial object is 1, which indicates that a preset financial event has occurred, and the label information in the feature information of the financial object is 0, which indicates that the preset financial event has not occurred. And taking preset financial event keywords in the characteristic information as numerical values of independent variables, and training the logistic regression model to obtain the trained logistic regression model. Each preset financial event keyword in the feature information of the financial object used for training corresponds to a regression coefficient, and the regression coefficient may represent the importance degree of the preset financial event keyword for determining whether the financial object will have a preset financial event.
Then, a plurality of regression results indicating whether the financial object will have the preset financial event can be obtained by inputting the preset financial event keywords in the characteristic information of each financial object for verification into the trained logistic regression model. The characteristic information of the financial object used for verification comprises the characteristic information of the financial object which has not occurred with the preset financial event and the characteristic information of the financial object which has occurred with the preset financial event. The feature information of the financial object in which the preset financial event has occurred among the feature information of the financial object used for verification may be referred to as first feature information, and the feature information of the financial object in which the preset financial event has not occurred among the feature information of the financial object used for verification may be referred to as second feature information.
After obtaining the multiple regression results, a ratio of the number of the first feature information corresponding to the regression result in accordance with the labeling information in all the first feature information to the number of all the first feature information may be calculated, and the ratio is referred to as a first ratio. In other words, the first ratio is to predict whether a financial object having a preset financial event will have a preset financial event among all financial objects corresponding to the feature information for verification by using the trained logistic regression model, and the obtained regression result is a ratio of the number of the preset financial events to the total number of the financial objects having the preset financial event.
After obtaining the multiple regression results, a ratio of the number of the second feature information corresponding to the regression result and the annotation information in all the second feature information to the number of all the second feature information may be calculated, and the ratio is referred to as a second ratio. In other words, the second ratio is a ratio of the number of financial objects in which no preset financial event occurs to the total number of financial objects in which no preset financial event occurs, which is obtained by predicting whether a preset financial event occurs to a financial object in which no preset financial event occurs among all financial objects corresponding to the feature information for verification by using the trained logistic regression model.
After the first proportion and the second proportion are calculated, whether the first proportion and the second proportion meet preset conditions can be judged, and the preset conditions comprise: the first proportion and the second proportion are both larger than a proportion threshold value, and when a preset condition is met, the trained logistic regression model can be used as a preset logistic regression model.
When the calculated first proportion and the second proportion do not meet the preset conditions, Bayesian analysis can be carried out on the regression results, and the regression coefficient of each preset financial event keyword is adjusted until the preset conditions are met; and taking the logistic regression model after adjusting the regression coefficient of the preset financial event key words as a preset logistic regression model.
Referring to FIG. 3, an exemplary flow chart for constructing a pre-set logistic regression model is shown.
In the application, the negative samples can be firstly subjected to cluster analysis to obtain the financial event keywords. For example, the financial object is a bond, the negative sample is news in three years of the company issuing the bond with the default event, and the words in the news are segmented to obtain a plurality of words appearing in the news. Clustering analysis can be performed on the plurality of words to obtain a plurality of financial event keywords.
And establishing a logistic regression model based on the financial event keywords, and outputting a probability value between 0 and 1. In the logistic regression model based on the financial event keywords, each financial event keyword corresponds to a regression coefficient, and the regression coefficient can represent the importance degree of the preset financial event keyword for judging whether the preset financial event occurs to the financial object. When determining whether the preset financial event occurs to the current financial object, the key word associated with the operation state of the entity object, which is extracted from the information of the entity object corresponding to the current financial object, may be input to the logistic regression model, and the logistic regression model outputs a probability indicating that the preset financial event may occur to the current financial object.
And carrying out Bayesian analysis on the regression result, and adjusting the regression coefficient. When the output result of the established logistic regression model is not ideal, Bayesian analysis can be carried out on the regression result, and the regression coefficient of the preset financial event keywords can be adjusted.
And step 203, generating indication information indicating whether the preset financial event occurs to the financial object or not based on the output result.
In this embodiment, after the keyword of the financial object is input to the preset logistic regression model in step 202 to obtain the output result, the indication information indicating whether the financial object will have the preset financial event may be generated according to the output result. For example, the output result of the preset logistic regression model is a probability indicating that the financial object will have a preset financial event, and according to the probability, it may be determined whether the financial object will have the preset financial event, and indication information indicating whether the financial object will have the preset financial event may be generated.
In some optional implementations of the embodiment, when the output result of the preset logistic regression model indicates that the financial object may have a probability of a preset financial event, and when the probability output by the preset logistic regression model is greater than a probability threshold, indication information indicating that the financial object may have the preset financial event may be generated. When the probability output by the preset logistic regression model is smaller than the probability threshold value, indication information indicating that the preset financial event does not occur to the financial object can be generated.
Taking a financial object as a bond and a preset financial event as a default event as an example, when the probability output by the preset logistic regression model is greater than a probability threshold, indicating information indicating that the bond will have the default event can be generated. When the probability output by the preset logistic regression model is smaller than the probability threshold value, indication information indicating that the bond does not have default events can be generated.
Referring to fig. 4, a schematic structural diagram of an embodiment of an information acquisition apparatus according to the present application is shown, the information acquisition apparatus including: acquisition section 401, prediction section 402, and generation section 403. The acquiring unit 401 is configured to acquire information of an entity object corresponding to a financial object, and extract a keyword from the information; the prediction unit 402 is configured to input the keyword into a preset logistic regression model, and obtain an output result, where the preset logistic regression model is generated based on training performed in advance by using feature information of a plurality of financial objects, where the feature information includes: indicating whether the financial object has the label information of the preset financial event or not and the preset financial event key word in the information of the entity object corresponding to the financial object; the generating unit 403 is configured to generate indication information indicating whether a preset financial event may occur to the financial object based on the output result.
In some optional implementations of this embodiment, the generating unit 403 includes: an indication information generating subunit (not shown) configured to generate, when the output result is a probability indicating that the financial object will have a preset financial event, indication information indicating that the financial object will have the preset financial event when the probability is greater than a probability threshold; and when the probability is smaller than the probability threshold value, generating indicating information indicating that the preset financial event does not occur to the financial object.
In some optional implementation manners of this embodiment, the information obtaining apparatus further includes: a first model generation unit (not shown) configured to construct a logistic regression model; acquiring characteristic information of a plurality of financial objects, and dividing the characteristic information into characteristic information for training and characteristic information for verification; training the logistic regression model by using the feature information for training to obtain the trained logistic regression model, wherein each preset financial event keyword in the feature information for training corresponds to one regression coefficient; determining a first quantity of first characteristic information containing marking information indicating that a preset financial event occurs to the financial object and a second quantity of second characteristic information containing marking information indicating that the preset financial event does not occur to the financial object in the characteristic information for verification; inputting preset financial event keywords in each piece of feature information for verification into the trained logistic regression model to obtain a plurality of regression results indicating whether the financial objects can generate the preset financial events; calculating a first proportion and a second proportion, wherein the first proportion is the proportion of the quantity of the first characteristic information of the corresponding regression result consistent with the labeling information to the first quantity, and the second proportion is the proportion of the quantity of the second characteristic information of the corresponding regression result consistent with the labeling information to the second quantity; judging whether the first proportion and the second proportion meet preset conditions or not, wherein the preset conditions comprise: the first proportion and the second proportion are both greater than a proportion threshold; when the first proportion and the second proportion meet preset conditions, taking the trained logistic regression model as a preset logistic regression model; a second model generation unit (not shown) configured to perform bayesian analysis on the regression result when the first ratio and the second ratio do not satisfy the preset condition, and adjust the regression coefficient of each preset financial event keyword until the preset condition is satisfied; taking the logistic regression model after adjusting the regression coefficient of the preset financial event key words as a preset logistic regression model; a keyword obtaining unit (not shown) configured to obtain information of an entity object corresponding to a financial object in which a preset financial event has occurred; dividing the information into a plurality of information sentences, and segmenting the information sentences to obtain a plurality of words; and carrying out cluster analysis on the plurality of words to obtain preset financial event keywords.
The application also provides a server, which can comprise the information acquisition device described in the figure 4. The server may be configured with one or more processors; a memory for storing one or more programs, wherein the one or more programs may include instructions for performing the operations described in the above steps 201 and 203. The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the operations described in step 201 and 203 above.
Fig. 5 shows a schematic structural diagram of a server suitable for implementing the information acquisition method according to the embodiment of the present application.
As shown in fig. 5, a Central Processing Unit (CPU)501 is included, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The CPU 501, ROM502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506; an output portion 507; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
The processes described in the above-described respective steps in the present application may be implemented as a computer program. The computer program may be carried on a computer readable medium, the computer program comprising instructions for carrying out the method illustrated in the flow chart. The computer program can be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.
The present application also provides a computer readable medium, which may be included in a server; or the device can exist independently and is not assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring information of an entity object corresponding to a financial object, and extracting keywords in the information; inputting the keywords into a preset logistic regression model to obtain an output result, wherein the preset logistic regression model is generated by training based on characteristic information of a plurality of financial objects in advance, and the characteristic information comprises: indicating whether the financial object has the label information of the preset financial event or not and the preset financial event key word in the information of the entity object corresponding to the financial object; and generating indication information indicating whether the preset financial event occurs to the financial object based on the output result.
It should be noted that the computer readable medium can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the present application. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.