WO2018196553A1 - Method and apparatus for obtaining identifier, storage medium, and electronic device - Google Patents
Method and apparatus for obtaining identifier, storage medium, and electronic device Download PDFInfo
- Publication number
- WO2018196553A1 WO2018196553A1 PCT/CN2018/081337 CN2018081337W WO2018196553A1 WO 2018196553 A1 WO2018196553 A1 WO 2018196553A1 CN 2018081337 W CN2018081337 W CN 2018081337W WO 2018196553 A1 WO2018196553 A1 WO 2018196553A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- identifier
- target
- feature
- preset
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Recommending goods or services
Definitions
- the present invention relates to the field of computers, and in particular to a method and device for acquiring an identifier, a storage medium, and an electronic device.
- sample representation words and optimization rules are prepared, and in a single user behavior log, pattern matching (regular matching) is used to mine
- the population with the characteristics of the sample character is selected as the positive sample population of the training data, and the negative sample population is the randomly selected sample after the positive sample population is excluded from the large population.
- pattern matching regular matching
- the user behavior log is single, the search matching population is limited, and the sample is biased.
- the positive sample population is not enough to explain the purity and reliability of the positive sample after mining through pattern matching.
- the above defects lead to the acquisition of existing training data samples in a manner that obtains less accurate identification of the training.
- the embodiment of the invention provides a method and a device for acquiring an identifier, a storage medium and an electronic device, so as to at least solve the technical problem that the accuracy of obtaining the identifier for training in the related art is low.
- a method for obtaining an identifier includes: obtaining an identifier corresponding to a predetermined operation from a plurality of data sources, wherein recording is performed in a target data source included in the plurality of data sources The account corresponding to the identifier and the predetermined operation performed by the account; obtaining an initial identifier from the identifier according to the feature information of the identifier and a preset feature word, wherein the feature information is used to indicate a feature of the predetermined operation; determining a feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate the The frequency of the predetermined operation performed by the account in the target data source, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation; and the first target identifier is obtained from the initial identifier, where the first A target identifier is a set of identifiers in
- an apparatus for acquiring an identifier including: a first acquiring module, configured to acquire an identifier corresponding to a predetermined operation from a plurality of data sources, wherein The target data source included in the data source includes the account corresponding to the identifier and the predetermined operation performed by the account; the second obtaining module is configured to set the feature information according to the identifier and the preset feature word Obtaining an initial identifier from the identifier, wherein the feature information is used to represent a feature of the predetermined operation; and the determining module is configured to determine a feature parameter of the initial identifier according to the preset weight and the feature information, where The preset weight is used to indicate the frequency at which the account in the target data source performs the predetermined operation, and the feature parameter is used to indicate that the initial identifier is executed.
- a third obtaining module configured to acquire a first target identifier from the initial identifier, wherein the first target identifier And is a set of identifiers in the initial identifier that are higher than a preset parameter.
- a storage medium comprising a stored program, wherein, when the program is running, controlling a device in which the storage medium is located to perform an acquisition method of the identifier.
- the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, wherein the target data source included in the plurality of data sources records the predetermined operation performed by the account and the account corresponding to the identifier;
- the identifier information and the preset feature word obtain an initial identifier from the identifier, wherein the feature information is used to represent a feature of the predetermined operation; and the feature parameter of the initial identifier is determined according to the preset weight and the feature information, wherein the preset weight and the target data are Correspondingly, the preset weight is used to indicate the frequency at which the account in the target data source performs the predetermined operation, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation, and the first target identifier is obtained from the initial identifier, where the first target identifier It is a set of identifiers in which the feature parameter is higher than the preset parameter in the initial identifier.
- the account corresponding to the identifier and the predetermined operation performed by the account are recorded, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the logo size is avoided from a single user log.
- the initial identifier is initially filtered according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to indicate the initial identifier execution.
- the frequency of the predetermined operation is then obtained from the initial identifier, the first target flag whose feature parameter is higher than the preset parameter, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving the acquisition.
- the accuracy of the identification of the training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
- FIG. 1 is a schematic diagram of an acquisition method of an identifier according to the related art
- FIG. 2 is a schematic diagram of an application environment of an optional method for acquiring an identifier according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of an optional method for acquiring an identifier according to an embodiment of the present invention
- FIG. 4 is a schematic diagram 1 of an optional identification acquiring device according to an embodiment of the present invention.
- FIG. 5 is a second schematic diagram of an apparatus for acquiring an identifier according to an embodiment of the present invention.
- FIG. 6 is a third schematic diagram of an optional identification acquiring device according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram 4 of an apparatus for acquiring an optional identifier according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram 5 of an optional identifier acquiring apparatus according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram 6 of an optional identification acquiring device according to an embodiment of the present invention.
- FIG. 10 is a schematic diagram of an application scenario of an optional method for acquiring an identifier according to an embodiment of the present invention.
- FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
- an embodiment of a method for acquiring the foregoing identifier is provided.
- the method for obtaining the identifier may be, but is not limited to, being applied to an application environment as shown in FIG. 2, and the server 202 is configured to obtain an identifier corresponding to the predetermined operation from the plurality of data sources.
- the target data source included in the source records an operation performed by the account and the account corresponding to the identifier; the feature information is used to indicate the feature of the predetermined operation; the preset weight corresponds to the target data source, and the preset weight is used to indicate the target data source.
- the frequency of the predetermined operation performed by the account number, the characteristic parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation; and the first target identifier is a set of the identifiers in the initial identifier whose feature parameters are higher than the preset parameters.
- the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the server 202 obtains the identifier corresponding to the predetermined operation, so that the acquisition path of the identifier is more extensive, and the single user log is avoided.
- the server 202 is configured to: acquire the first feature word and the second feature word, where the preset feature word includes the first feature word and the second feature word; and obtain the initial identifier from the identifier
- the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
- the server 202 is configured to: obtain a preset weight, wherein a larger value of the preset weight indicates that the frequency of the account in the target data source performs a predetermined operation is higher; and the feature information is obtained from the feature information.
- the time information and the frequency information wherein the time information is used to indicate the time when the performing the predetermined operation is performed, the frequency information is used to indicate the frequency of the identification performing the predetermined operation, and the characteristic parameter is determined according to the preset weight, the time information and the frequency information, wherein the characteristic parameter A larger value indicates that the initial identification performs a predetermined operation more frequently.
- the server 202 is configured to: acquire a proportion of an account that performs a predetermined operation in the target data source, and allocate a preset to the target data source according to the ratio; a weight, wherein the larger the proportion, the greater the preset weight of the data source allocation; or the number of the same identifier in the first identifier set and the preset identifier set, wherein the first identifier set is a target data in the initial identifier A set of identifiers included in the source; a preset weight is assigned to the target data source according to a ratio between the quantity and the number of identifiers in the first identifier set, wherein the larger the ratio, the greater the preset weight of the data source allocation.
- the server 202 is configured to: calculate a product of the corresponding time information and frequency information of the initial identifier in each target data source; calculate a weighted sum of the products according to the preset weight to obtain a feature parameter.
- the server 202 is configured to: acquire information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identifier, wherein the information for indicating the feature of the predetermined operation includes: the predetermined operation corresponding to Characteristic words, time information and frequency information; storing feature words, time information and frequency information into a preset format to obtain feature information.
- the server 202 is configured to: arrange the initial identifiers according to the feature parameters from high to low; select the first target identifier from the aligned identifiers, where the first target identifier is included in The identifier of the first N bits in the aligned identifiers; or the first target identifier whose value of the feature parameter is greater than or equal to the preset value is obtained from the initial identifier.
- the server 202 is configured to: match the first target identifier with the preset target identifier; and determine the first target if the first target identifier and the preset target identifier match successfully.
- the identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
- the server 202 is further configured to: determine whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number; and determine the first target identifier and the preset If the target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
- the server 202 is further configured to: obtain an identifier corresponding to the account that is included in the multiple data sources; and randomly obtain the first target identifier from the identifier corresponding to the account that is included in the multiple data sources.
- the identifier other than the identifier is obtained, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
- the client may further include a client connected to the server 202 through a network, where the server 202 is further configured to: train the prediction model according to the first target identifier and the second target identifier; According to the prediction model, the to-be-pushed identifier is obtained for the to-be-pushed resource from the identifiers of the plurality of data sources, and the to-be-pushed resource is pushed to the client used by the account corresponding to the to-be-advertised identifier.
- the foregoing client may include, but is not limited to, at least one of the following: a mobile phone, a tablet computer, a notebook computer, a desktop PC, a digital television, and other hardware devices for area sharing.
- the above network may include, but is not limited to, at least one of the following: a wide area network, a metropolitan area network, and a local area network. The above is only an example, and the embodiment does not limit this.
- a method for obtaining an identifier includes:
- the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, where the account and the account corresponding to the identifier are recorded in the target data source included in the plurality of data sources;
- S306. Determine a feature parameter of the initial identifier according to the preset weight and the feature information, where the preset weight is corresponding to the target data source, and the preset weight is used to indicate a frequency at which the account in the target data source performs a predetermined operation, and the feature parameter is used to indicate Initially identifying the frequency at which the predetermined operation is performed;
- the method for acquiring the identifier may be, but is not limited to, being applied to the method of acquiring the identifier sample for model training, and using the training result to push the resource for the client.
- the above client may be, but not limited to, various types of software, such as search software, social software, instant messaging software, news information software, game software, shopping software, and the like.
- it may be, but is not limited to, being applied to a scenario in which the foregoing identification sample is used for model training, and the training result is used by the client of the shopping software to push resources, or may be, but not limited to, applied to the model training in the above-mentioned acquisition identification sample.
- the training result is used to push the resource of the client of the search software to realize the acquisition of the identification sample.
- the above is only an example, and is not limited in this embodiment.
- the multiple data sources may be various platforms, software, websites, applications, and the like.
- social applications For example: social applications, search engines, e-commerce websites, advertising platforms, etc.
- the identifiers may correspond to different account accounts in different data sources.
- a user may have registered an account on multiple applications, for example, an account A is registered on the social platform, an account B is registered on the shopping website, and an account C is registered on the instant messaging application, and the user can If the three accounts on the above platform are associated, the three accounts A, B, and C can be used to uniquely identify the user.
- one or more data sources may be included in the target data source. That is to say, the account in the data source corresponding to the identifier is recorded in the data source, and the operation performed by the account.
- the identifier corresponding to the predetermined operation may be recorded in one of the plurality of data sources, or may be recorded in several of the plurality of data sources.
- the predetermined operation may be to identify a certain behavior performed or a phrase for characterizing the behavior. For example, if the user to be excavated is a user who purchases a maternal and child product, the predetermined operation may be "click on the entry with milk powder or diaper", or "milk powder", "diaper” and the like.
- the identifier corresponding to the predetermined operation obtained from the plurality of data sources may first obtain an account searched for “milk powder” and “diaper” in the search engine, and the account of the purchased milk powder or diaper in the shopping website is sent in the instant messaging software.
- the account number of the message "milk powder", “diaper” and the like, and the account number of the item with the powdered milk or the diaper are clicked in the multiple data sources, and the corresponding identifiers of the above-mentioned accounts are obtained.
- the initial identifier may include, but is not limited to, including one or more identifiers.
- the preset feature words may be, but are not limited to, one or more feature words.
- the first target identifier may include, but is not limited to, including one or more identifiers.
- the preset weight may be used to indicate a frequency at which an account in the target data source performs a predetermined operation.
- the preset weight can be used to indicate the degree of attention of the account in the target data source to the predetermined operation, which can be represented by, but not limited to, the frequency with which the account in the target data source performs the predetermined operation.
- the frequency at which the account in the target data source performs the predetermined operation may be, but is not limited to, the number of accounts in the target data source that are frequently performed by the predetermined operation (for example, the account whose frequency exceeds 5 times per day performs the predetermined operation. 50% of the total number of accounts in the data source).
- performing the saliency of the predetermined operation with the account number in the target data source to indicate the frequency at which the account in the target data source performs the predetermined operation.
- the significance of the account performing the predetermined operation in the target data source can be determined by calculating the proportion of the identifier in the initial identifier in which the account number is recorded in the target data source (for example, the identifier of the last push resource).
- the preset weight may be set according to a frequency at which the predetermined operation is performed by the account in the target data source, or may be performed according to the frequency of performing the predetermined operation according to the account in the target data source.
- the model is trained in the way it is calculated.
- the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the identifier corresponding to the predetermined operation is obtained from the target data source, so that the obtaining path of the identifier is more extensive, and the identifier is obtained from a single user log.
- the acquired identifier has a biased problem, and the initial identifier is preliminarily selected according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to represent the initial identifier.
- the frequency of the predetermined operation is performed, and then the first target flag whose feature parameter is higher than the preset parameter is obtained from the initial identifier, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving acquisition.
- the accuracy of the identification for training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
- obtaining the initial identifier from the identifier according to the identifier information and the preset feature word includes:
- S1 acquiring a first feature word and a second feature word, where the preset feature word includes a first feature word and a second feature word;
- the initial identifier is obtained from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
- the preset feature words may include, but are not limited to, a first feature word and a second feature word.
- the preset feature words can be used to represent characteristics of a type of user population, which can include positive representation words and negative representation words, wherein the positive representation words (equivalent to the first characteristic words described above), that is, the key words in the popular sense Words, used to represent feature populations, negative representation words (equivalent to the above second feature words), ie filter words (filter_words), the role of negative representation words, in denoising, that is, to remove some multi-word stitching The latter noise makes the positive representation words more representative of the characteristic population.
- the initial identification of the identifier is achieved by obtaining the initial identifier from the identifier according to the identified feature information and the first feature word and the second feature word included in the preset feature word.
- determining the feature parameters of the initial identifier according to the preset weight and the feature information includes:
- S1 Obtain a preset weight, wherein a larger value of the preset weight indicates that the frequency of performing the predetermined operation on the account in the target data source is higher;
- S2 Obtain time information and frequency information from the feature information, where the time information is used to indicate a time when the performing a predetermined operation is performed, and the frequency information is used to indicate a frequency indicating that the predetermined operation is performed;
- the feature parameter is determined according to the preset weight, the time information, and the frequency information. The greater the value of the feature parameter, the higher the frequency at which the initial identifier performs the predetermined operation.
- the preset weight may be obtained by one of the following methods:
- the proportion of the account that performs the predetermined operation in the target data source in all the accounts included in the target data source is obtained; the preset weight is assigned to the target data source according to the ratio, wherein the larger the proportion of the data source is allocated Set the weight more.
- target data sources For example, there are three target data sources: target data source A, target data source B, and target data source C. There are 100 accounts in the target data source A, and 34 of them have performed predetermined operations on the target data source. There are 200 accounts in B, of which 25 have performed predetermined operations, and there are 100 accounts in the target data source C, of which 56 have performed predetermined operations. Then, the ratios of the target data source A, the target data source B, and the target data source C are 34%, 12.5%, and 56%, respectively, and the target data source A, the target data source B, and the target data are obtained according to the obtained ratio. Source C assigns preset weights 2, 1, and 3.
- Manner 2 Obtain a quantity of the same identifier in the first identifier set and the preset identifier set, where the first identifier set is a set of identifiers included in a target data source in the initial identifier; and the identifier in the first identifier set according to the quantity
- the ratio between the number of inputs is a preset weight assigned to the target data source, wherein the larger the ratio, the greater the default weight assigned by the data source.
- the preset identifier set may be, but is not limited to, an identifier included in the target data source in the first target identifier acquired in the previous time, or is included in the target data source according to the identifier of the previous push data. logo.
- the identifier of the target data source in the first target identifier acquired by the preset identifier set is taken as an example, and the preset identifier set A corresponding to the target data source A includes 40 identifiers, and the target data is included.
- the preset identifier set B corresponding to the source B includes 30 identifiers, and the preset identifier set C corresponding to the target data source C includes 40 identifiers; the initial identifier includes the target data source A, the target data source B, and the target data.
- the number of the identifiers of the source C is 20, 40, and 40 respectively.
- the first identifier set A corresponding to the target data source A includes 20 identifiers
- the first identifier set B corresponding to the target data source B includes 40 identifiers
- the first identifier set C corresponding to the target data source C includes 40 identifiers, wherein the first identifier set A is matched with the identifier in the preset identifier set A, and the first identifier set A and the preset identifier set A are obtained.
- the number of the same identifier is 10, and the first identifier set B is matched with the identifier in the preset identifier set B, and the number of the same identifier in the first identifier set B and the preset identifier set B is 5, which will be the first Standard
- the set C is matched with the identifier in the preset identifier set C, and the number of the same identifiers in the first identifier set C and the preset identifier set C is 20, and the target data source A is obtained according to the obtained number of the same identifiers.
- the target data source B and the target data source C are respectively assigned preset weights 2, 1, and 3.
- the feature parameter may be determined by: calculating a product of the corresponding time information and frequency information of the initial identifier in each target data source, and then calculating a weighted sum of the products according to the preset weight, to obtain Characteristic Parameters.
- source represents the data source, there are n data sources; weight represents the preset weight on each data source; time represents the above time information, you can use abs (user behavior occurrence time - current mining time ), that is, the absolute value of the behavior time difference to represent the above time information, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the feature parameter is, the farther from the current time, the smaller the feature parameter; Representing the above frequency information, it can be used to indicate the frequency of user behavior.
- the sigmoid function is taken and normalized; the more the behavior frequency is, the higher the feature parameter is.
- the feature identifier of the initial identifier is determined according to the preset weight and the feature information, and the initial identifier is scored, which can be used to measure the frequency at which the initial identifier performs a predetermined operation, so that the first target identifier selected from the initial identifier is further It can represent a predetermined operation, thereby improving the accuracy of obtaining the identification for training, thereby overcoming the problem of low accuracy in obtaining the identification for training in the related art.
- the method before the initial identifier is obtained from the identifier according to the identifier information and the preset feature word, the method further includes:
- S2 storing feature words, time information, and frequency information into a preset format to obtain feature information.
- the information for indicating the feature of the predetermined operation acquired from the predetermined operation corresponding to the identification is organized into a predetermined format for storage, so that the comparison of the feature words is faster and more convenient.
- obtaining the first target identifier from the initial identifier includes one of the following:
- the initial identifiers are arranged according to the feature parameters from high to low; the first target identifier is selected from the aligned identifiers, wherein the first target identifier includes the identifiers ranked in the first N digits in the aligned identifiers;
- the feature parameters may be sorted from high to low, and the identifiers ranked in the first N bits are used as identifiers whose feature parameters are higher than the preset parameters, to obtain the first target identifier.
- the preset value may be set, and the identifier corresponding to the feature parameter whose value is greater than or equal to the preset value is used as the first target identifier.
- obtaining the first target identifier by sorting the feature parameters from high to low, or setting the preset value can clearly select an identifier that is more representative of the predetermined operation from the initial identifier.
- the method further includes:
- first target identifier and the preset target identifier are successfully matched, determine that the first target identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, re-acquire the first A target identifier.
- the first target identifier is matched with the preset target identifier by determining whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number. If it is determined that the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
- the preset target identifier may be the first target identifier acquired last time, and may also be a preset target identifier.
- the first target identifier when the first target identifier is re-acquired, the first target identifier may be obtained by, but not limited to, re-acquiring the identifier corresponding to the predetermined operation by resetting the predetermined operation. It is also possible, but not limited to, to reacquire the first target identity by re-assigning the target weight to the target data source.
- the first target identifier is matched with the preset target identifier. If the matching succeeds, it may be determined that the currently acquired first target identifier meets the needs of the model training, that is, the first target identifier is required. logo. On the other hand, if the matching is unsuccessful, it indicates that the currently acquired first target identifier does not meet the needs of the model training, and the first target identifier may be reacquired.
- the method further includes:
- the first target identifier may be used as a positive sample of the model training, and after acquiring the first target identifier, the second target identifier may be obtained as a model from all identifiers of the multiple data sources. Negative sample of training.
- the identifiers other than the first target identifier are randomly obtained from the identifiers corresponding to the account numbers included in the multiple data sources, and after the second target identifier is obtained, the method further includes:
- the acquired first target identifier and the second target identifier may be used to perform training of the predictive model, so that the to-be-pushed identifier obtained by the predictive model can more accurately represent the pointed operation. crowd. Thereby, the efficiency of pushing resources can be made higher.
- the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
- the technical solution of the present invention in essence or the contribution to the related art can be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, CD-ROM).
- the instructions include a number of instructions for causing a terminal device (which may be a cell phone, computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.
- an apparatus for acquiring an identifier of an acquisition method of the foregoing identifier As shown in FIG. 4, the apparatus includes:
- the first obtaining module 42 is configured to obtain an identifier corresponding to the predetermined operation from the plurality of data sources, wherein the account and the account corresponding to the identifier are recorded in the target data source included in the plurality of data sources operating;
- the second obtaining module 44 is configured to obtain an initial identifier from the identifier according to the identified feature information and the preset feature word, where the feature information is used to indicate a feature of the predetermined operation;
- the determining module 46 is configured to determine the feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate that the account in the target data source performs the predetermined operation. Frequency, the characteristic parameter is used to indicate the frequency at which the initial identification performs a predetermined operation;
- the third obtaining module 48 is configured to obtain the first target identifier from the initial identifier, where the first target identifier is a set of identifiers in the initial identifier whose feature parameters are higher than the preset parameters.
- the acquiring device of the foregoing identifier may be, but is not limited to, being applied to the method of acquiring the identifier sample for model training, and using the training result to push the resource for the client.
- the above client may be, but not limited to, various types of software, such as search software, social software, instant messaging software, news information software, game software, shopping software, and the like.
- it may be, but is not limited to, being applied to a scenario in which the foregoing identification sample is used for model training, and the training result is used by the client of the shopping software to push resources, or may be, but not limited to, applied to the model training in the above-mentioned acquisition identification sample.
- the training result is used to push the resource of the client of the search software to realize the acquisition of the identification sample.
- the above is only an example, and is not limited in this embodiment.
- the multiple data sources may be various platforms, software, websites, applications, and the like.
- social applications For example: social applications, search engines, e-commerce websites, advertising platforms, etc.
- the identifiers may correspond to different account accounts in different data sources.
- a user may have registered an account on multiple applications, for example, an account A is registered on the social platform, an account B is registered on the shopping website, and an account C is registered on the instant messaging application, and the user can If the three accounts on the above platform are associated, the three accounts A, B, and C can be used to uniquely identify the user.
- one or more data sources may be included in the target data source. That is to say, the account in the data source corresponding to the identifier is recorded in the data source, and the operation performed by the account.
- the identifier corresponding to the predetermined operation may be recorded in one of the plurality of data sources, and may also be recorded in several of the plurality of data sources.
- the predetermined operation may be to identify a certain behavior performed or a phrase for characterizing the behavior. For example, if the user to be excavated is a user who purchases a maternal and child product, the predetermined operation may be "click on the entry with milk powder or diaper", or "milk powder", "diaper” and the like.
- the identifier corresponding to the predetermined operation obtained from the plurality of data sources may first obtain an account searched for “milk powder” and “diaper” in the search engine, and the account of the purchased milk powder or diaper in the shopping website is sent in the instant messaging software.
- the account number of the message "milk powder", “diaper” and the like, and the account number of the item with the powdered milk or the diaper are clicked in the multiple data sources, and the corresponding identifiers of the above-mentioned accounts are obtained.
- the initial identifier may include, but is not limited to, including one or more identifiers.
- the preset feature words may be, but are not limited to, one or more feature words.
- the first target identifier may include, but is not limited to, including one or more identifiers.
- the preset weight may be used to indicate a frequency at which an account in the target data source performs a predetermined operation.
- the preset weight can be used to indicate the degree of attention of the account in the target data source to the predetermined operation, which can be expressed, but not limited to, by the frequency with which the account in the target data source performs the predetermined operation.
- the frequency at which the account in the target data source performs the predetermined operation may be, but is not limited to, the number of accounts in the target data source that are frequently performed by the predetermined operation (for example, the account whose frequency exceeds 5 times per day performs the predetermined operation. 50% of the total number of accounts in the data source).
- performing the saliency of the predetermined operation with the account number in the target data source to indicate the frequency at which the account in the target data source performs the predetermined operation.
- the significance of the account performing the predetermined operation in the target data source can be determined by calculating the proportion of the identifier in the initial identifier in which the account number is recorded in the target data source (for example, the identifier of the last push resource).
- the preset weight may be set according to a frequency at which the predetermined operation is performed by the account in the target data source, or may be performed according to the frequency of performing the predetermined operation according to the account in the target data source.
- the model is trained in the way it is calculated.
- the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the identifier is obtained from a single user log.
- the acquired identifier has a biased problem, and the initial identifier is preliminarily selected according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to represent the initial identifier.
- the frequency of the predetermined operation is performed, and then the first target flag whose feature parameter is higher than the preset parameter is obtained from the initial identifier, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving acquisition.
- the accuracy of the identification for training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
- the second obtaining module 44 includes:
- the first obtaining unit 52 is configured to acquire the first feature word and the second feature word, wherein the preset feature word includes the first feature word and the second feature word;
- the second obtaining unit 54 is configured to obtain an initial identifier from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
- the preset feature words may include, but are not limited to, a first feature word and a second feature word.
- the preset feature words can be used to represent characteristics of a type of user population, which can include positive representation words and negative representation words, wherein the positive representation words (equivalent to the first characteristic words described above), that is, the key words in the popular sense Words, used to represent feature populations, negative representation words (equivalent to the above second feature words), ie filter words (filter_words), the role of negative representation words, in denoising, that is, to remove some multi-word stitching The latter noise makes the positive representation words more representative of the characteristic population.
- the initial identifier is obtained from the identifier according to the feature information of the identifier and the first feature word and the second feature word included in the preset feature word, thereby implementing preliminary screening of the identifier.
- the determining module 46 includes:
- the third obtaining unit 62 is configured to acquire a preset weight, wherein a larger value of the preset weight indicates that the frequency of performing the predetermined operation by the account in the target data source is higher;
- the fourth obtaining unit 64 is configured to obtain the time information and the frequency information from the feature information, wherein the time information is used to indicate the time when the performing the predetermined operation is performed, and the frequency information is used to indicate the frequency of the identification performing the predetermined operation;
- the determining unit 66 is configured to determine the feature parameter according to the preset weight, the time information, and the frequency information, wherein the larger the value of the feature parameter is, the higher the frequency at which the initial identifier performs the predetermined operation.
- the third obtaining unit 62 is set to one of the following:
- the first identifier set Obtaining, by the first identifier set, a quantity of the same identifier in the preset identifier set, where the first identifier set is a set of identifiers included in a target data source in the initial identifier; and according to the quantity and the quantity identified in the first identifier set
- the ratio between the two is assigned a preset weight, and the larger the ratio, the larger the default weight assigned by the data source.
- target data sources For example, there are three target data sources: target data source A, target data source B, and target data source C. There are 100 accounts in the target data source A, and 34 of them have performed predetermined operations on the target data source. There are 200 accounts in B, of which 25 have performed predetermined operations, and there are 100 accounts in the target data source C, of which 56 have performed predetermined operations. Then, the ratios of the target data source A, the target data source B, and the target data source C are 34%, 12.5%, and 56%, respectively, and the target data source A, the target data source B, and the target data are obtained according to the obtained ratio. Source C assigns preset weights 2, 1, and 3.
- the preset identifier set may be, but is not limited to, an identifier included in the target data source in the first target identifier acquired in the previous time, or is included in the target data source according to the identifier of the previous push data. logo.
- the identifier of the target data source in the first target identifier acquired by the preset identifier set is taken as an example, and the preset identifier set A corresponding to the target data source A includes 40 identifiers, and the target data is included.
- the preset identifier set B corresponding to the source B includes 30 identifiers, and the preset identifier set C corresponding to the target data source C includes 40 identifiers; the initial identifier includes the target data source A, the target data source B, and the target data.
- the number of the identifiers of the source C is 20, 40, and 40 respectively.
- the first identifier set A corresponding to the target data source A includes 20 identifiers
- the first identifier set B corresponding to the target data source B includes 40 identifiers
- the first identifier set C corresponding to the target data source C includes 40 identifiers, wherein the first identifier set A is matched with the identifier in the preset identifier set A, and the first identifier set A and the preset identifier set A are obtained.
- the number of the same identifier is 10, and the first identifier set B is matched with the identifier in the preset identifier set B, and the number of the same identifier in the first identifier set B and the preset identifier set B is 5, which will be the first Standard
- the set C is matched with the identifier in the preset identifier set C, and the number of the same identifiers in the first identifier set C and the preset identifier set C is 20, and the target data source A is obtained according to the obtained number of the same identifiers.
- the target data source B and the target data source C are respectively assigned preset weights 2, 1, and 3.
- the fourth obtaining unit 64 is configured to: calculate a product of the initial information and the frequency information corresponding to the initial identifier in each target data source; calculate a weighted sum of the products according to the preset weight, to obtain Characteristic Parameters.
- source represents the data source, there are n data sources; weight represents the preset weight on each data source; time represents the above time information, you can use abs (user behavior occurrence time - current mining time ), that is, the absolute value of the behavior time difference to represent the above time information, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the feature parameter is, the farther from the current time, the smaller the feature parameter; Representing the above frequency information, it can be used to indicate the frequency of user behavior.
- the sigmoid function is taken and normalized; the more the behavior frequency is, the higher the feature parameter is.
- the feature identifier of the initial identifier is determined according to the preset weight and the feature information, and the initial identifier is scored, which can be used to measure the frequency at which the initial identifier performs the predetermined operation, so that the first target identifier selected from the initial identifier is further It can represent a predetermined operation, thereby improving the accuracy of obtaining the identification for training, thereby overcoming the problem of low accuracy in obtaining the identification for training in the related art.
- the apparatus further includes:
- a sixth obtaining module configured to acquire information for indicating a feature of the predetermined operation from a predetermined operation corresponding to the identifier, wherein the information for indicating the feature of the predetermined operation comprises: a feature word corresponding to the predetermined operation, time information, and frequency information;
- the storage module is configured to store the feature words, the time information, and the frequency information into a preset format to obtain the feature information.
- the information for indicating the feature of the predetermined operation acquired from the predetermined operation corresponding to the identification is sorted into a predetermined format for storage, thereby making the comparison of the feature words more convenient and convenient.
- the third obtaining module 48 includes one of the following:
- the processing unit 72 is configured to arrange the initial identifiers according to the feature parameters from high to low; select the first target identifier from the arranged identifiers, wherein the first target identifier is included in the aligned identifiers.
- the fifth obtaining unit 74 is configured to acquire, from the initial identifier, a first target identifier whose value of the feature parameter is greater than or equal to a preset value.
- the feature parameters may be sorted from high to low, and the identifiers ranked in the first N bits are used as identifiers whose feature parameters are higher than the preset parameters, to obtain the first target identifier.
- the preset value may be set, and the identifier corresponding to the feature parameter whose value is greater than or equal to the preset value is used as the first target identifier.
- the foregoing apparatus further includes:
- the matching module 82 is configured to match the first target identifier with the preset target identifier
- the processing module 84 is configured to determine that the first target identifier is a required identifier if the first target identifier and the preset target identifier are successfully matched; and the first target identifier and the preset target identifier are not successfully matched. In the case of re-acquiring the first target identifier.
- the matching module 82 is configured to: determine whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number; and determine the first target identifier and the preset If the target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
- the preset target identifier may be the first target identifier acquired last time, and may also be a preset target identifier.
- the first target identifier when the first target identifier is re-acquired, the first target identifier may be obtained by, but not limited to, re-acquiring the identifier corresponding to the predetermined operation by resetting the predetermined operation. It is also possible, but not limited to, to reacquire the first target identity by re-assigning the target weight to the target data source.
- the first target identifier is matched with the preset target identifier. If the matching succeeds, it may be determined that the currently acquired first target identifier meets the needs of the model training, that is, the first target identifier is required. logo. On the other hand, if the matching is unsuccessful, it indicates that the currently acquired first target identifier does not meet the needs of the model training, and the first target identifier may be reacquired.
- the foregoing apparatus further includes:
- the fourth obtaining module 92 is configured to acquire an identifier corresponding to the account included in the plurality of data sources;
- the fifth obtaining module 94 is configured to randomly obtain an identifier other than the first target identifier from the identifier corresponding to the account number included in the plurality of data sources, to obtain a second target identifier, where the second target identifier includes The number of identifiers is the same as the number of identifiers included in the first target identifier.
- the first target identifier may be used as a positive sample of the model training, and after acquiring the first target identifier, the second target identifier may be obtained as a model from all identifiers of the multiple data sources. Negative sample of training.
- the foregoing apparatus further includes:
- a training module configured to train the prediction model according to the first target identifier and the second target identifier
- the seventh obtaining module is configured to obtain, to be pushed, a to-be-pushed identifier for the to-be-pushed resource from the identifiers included in the plurality of data sources according to the prediction model;
- the push module is configured to push the to-be-pushed resource to the to-be-pushed identifier.
- the acquired first target identifier and the second target identifier may be used to perform training of the predictive model, so that the to-be-pushed identifier obtained by the predictive model can more accurately represent the pointed operation. crowd. Thereby, the efficiency of pushing resources can be made higher.
- the application environment of the embodiment of the present invention may be, but is not limited to, the application environment in the first embodiment, which is not described in this embodiment.
- An embodiment of the present invention provides an optional specific application example for implementing the foregoing method for obtaining an identifier.
- the method for obtaining the identifier may be, but is not limited to, applied to a scenario for acquiring an identifier as shown in FIG. 10 .
- the plurality of data sources provide data for the server, and the server obtains the first target identifier and the second target identifier according to the data obtained from the data source, and then performs training on the predictive model according to the first target identifier and the second target identifier, and is trained.
- the prediction model selects the identifier of the to-be-pushed resource from all the identifiers, and pushes the to-be-pushed resource to the filtered login client.
- multiple data sources may include social/search/e-commerce/advertising/mobile application (application, referred to as app) and the like to use the identified user in social/search/e-commerce/
- app application, referred to as app
- the user behavior in the field of advertising/mobile app is used as the characteristic information of the logo, and the primary selected crowd in each vertical industry is mined through the text semantics; the historical effect in the target data source is verified by the matching of the first identifier set and the same identifier in the preset identifier set.
- the saliency is given a preset weight and is sorted according to preset weights and frequency information (for example: user behavior frequency) and time information (for example: time decay factor) for the primary selection; by selecting the top N positions
- the identifier obtains the first target identifier, and the cross-validation of the historical effect is performed by matching the first target identifier with the preset target identifier, and the positive sample of the training data can be effectively selected; and the selected positive is subtracted from the active population of the large disc
- the sample set randomly acquires a second target identifier of the same size from the remaining set as a negative sample set. Thereby, the server obtains the first target identifier and the second target identifier.
- the positive and negative samples of the training data are obtained through text semantic feature mining, and the user's various user behavior characteristics in the social/search/e-commerce/advertising/mobile app domain are integrated, and then the user behavior frequency factor is adopted (ie, The above frequency information) and the behavior time decay factor (ie, the above time information), and the historical effect verification of the user on different behaviors, giving the user different behavior weight factors (ie, the above-mentioned preset weights), synthesizing the above elements, and making the user
- the scores that is, the feature parameters obtained above
- the scores can be sorted according to the scores, and the purity of the positive samples (ie, the first target identifier) can be effectively determined, and the markers ranked in the first N positions can be freely selected as training according to needs.
- Positive sample of data Thereby solving the problem that the user behavior is single and the purity of the positive sample is low.
- the behavior characteristics of the user in various scenarios on the Internet can be integrated, and the identifier corresponding to the user population with specific specific representation meanings can be mined, and the positive and negative samples with higher purity can be obtained through verification detection.
- the foregoing server in this embodiment may include the following functional modules:
- a feature representation word collection module configured to define a feature representation word (corresponding to the preset feature word) according to a feature of the identifier corresponding to the specific population that needs to be filtered, which includes a positive representation word (corresponding to the first feature described above) Word) and negative representation words (equivalent to the above second feature words), wherein the positive representation words, that is, the keywords in the popular sense, the negative representation words, ie the filter words (filter_words), the negative representation words
- the function is to denoise, that is, to remove some of the multi-word spliced noise, so that the positive representation words can better represent our characteristic population.
- the user multiple behavior feature fusion module is set to be refined by the user in various behaviors in the fields of social/search/e-commerce/advertising/mobile app (user identification-character representation string-time information-frequency information) ) These key elements.
- the pattern matching module is configured to: according to the feature representation words in the feature representation word collection module, the user multiple behavior data in the user multiple behavior feature fusion module (user identification-feature representation string-time information-frequency information) In the pattern matching method, the user identifier containing the positive representation word but not the negative representation word is searched for as the primary selection identifier.
- the user scoring module is set to score the primary selection identifier in the pattern matching module (ie, acquire the feature parameter), and the scoring involves two parts, one part is to calculate the preset weight of the data source, and the part is fine.
- calculate the behavior score of each primary identifier; where weight is calculated there are two ways. First, the data source is divided into population packets, and the first identifier set and the preset identifier set are used. The matching of the same identifier respectively verifies the saliency of the crowd package on a single target data source, and assigns the preset weight of the current data source according to the relative value of the saliency; the other way is through the model training method, such as the LR method.
- the training obtains the final weight of the final data source. For example, first assign an initial weight value to each data source, and then train each data source as its feature according to the primary selected small-scale positive and negative samples, and finally iterate. After convergence, the model can spit out the preset weights of each data source.
- each initial identifier is scored according to the following formula:
- source represents the data source
- weight represents the preset weight on each data source
- time is time information, in this example, abs (user behavior occurrence time - current Mining time), that is, the absolute value of the behavior time difference, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the score is, the farther from the current time, the smaller the score
- the action is
- the frequency information is used to represent the frequency of the user identification.
- the sigmoid function is taken and normalized. The more the behavior frequency is, the higher the score is.
- the positive and negative sample selection module is set to sort the rankings of the first N people according to the ranking of the primary selection group in the user scoring module, and select the identifiers ranked in the first N digits (the value of the N value may be different according to the orientation identifier to be mined, and the characteristic parameters)
- the first N digits of the identification is a positive sample
- the positive sample set is excluded from the identification of the active user of the large disk, and the same positive sample is selected from the remaining sets.
- the size of the population is identified as a negative sample.
- an electronic device for implementing the method for acquiring the above identifier.
- the electronic device may include: one or more (only one shown in the figure) processor 201
- the memory 203, and the transmission device 205, as shown in FIG. 11, may further include an input and output device 207.
- the memory 203 can be used to store a computer program and a module, such as the method for acquiring the identifier and the program instruction/module corresponding to the device in the embodiment of the present invention.
- the processor 201 is configured to run the software program and the module stored in the memory 203. , thereby performing various functional applications and data processing, that is, implementing the above data loading method.
- Memory 203 can include high speed random access memory, and can also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
- memory 203 can further include memory remotely located relative to processor 201, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
- the above described transmission device 205 is used to receive or transmit data via a network, and can also be used for data transmission between the processor and the memory. Specific examples of the above network may include a wired network and a wireless network.
- the transmission device 205 includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
- the transmission device 205 is a Radio Frequency (RF) module for communicating with the Internet wirelessly.
- NIC Network Interface Controller
- RF Radio Frequency
- the memory 203 is used to store an application.
- the processor 201 may call the application stored in the memory 203 through the transmission device 205 to perform the steps of: acquiring an identifier corresponding to the predetermined operation from the plurality of data sources, wherein the target data source included in the plurality of data sources The account corresponding to the identifier and the predetermined operation performed by the account are recorded; the initial identifier is obtained from the identifier according to the feature information of the identifier and the preset feature word, where the feature information is used by Determining a feature of the predetermined operation; determining a feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate The frequency of the predetermined operation is performed by the account in the target data source, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation, and the first target identifier is obtained from the initial identifier, where The first target identifier is a set of identifiers in the initial identifie
- the processor 201 is further configured to: acquire the first feature word and the second feature word, wherein the preset feature word includes the first feature word and the second feature word; from the identifier Obtaining the initial identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
- the processor 201 is further configured to: perform the step of: acquiring the preset weight, wherein a greater value of the preset weight indicates that a higher frequency of an account in the target data source performing the predetermined operation is performed; Obtaining time information and frequency information in the feature information, wherein the time information is used to indicate a time when the identifier performs the predetermined operation, and the frequency information is used to indicate the frequency at which the identifier performs the predetermined operation; The preset weight, the time information, and the frequency information determine the feature parameter, wherein a greater value of the feature parameter indicates a higher frequency at which the initial identifier performs the predetermined operation.
- the processor 201 is further configured to: perform: acquiring a proportion of an account number of the target data source that performs the predetermined operation in all accounts included in the target data source;
- the target data source allocates the preset weight, wherein the predetermined weight of the data source allocated by the data source is larger; and the number of the same identifier in the first identifier set and the preset identifier set is obtained, where
- the first identifier set is a set of identifiers included in one of the target data sources in the initial identifier; and a ratio between the quantity and the number of identifiers in the first identifier set is the target data source.
- the preset weight is allocated, wherein the predetermined weight of the data source allocated by the larger ratio is larger.
- the processor 201 is further configured to: calculate a product of the initial identifier corresponding to the time information and the frequency information in each of the target data sources; and calculate the product according to the preset weight The weighted sum is obtained to obtain the characteristic parameter.
- the processor 201 is further configured to: obtain information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identifier, wherein the information for indicating a feature of the predetermined operation And including: the feature word corresponding to the predetermined operation, the time information and the frequency information; storing the feature word, the time information, and the frequency information into a preset format to obtain the feature information.
- the processor 201 is further configured to perform one of the following steps: arranging the initial identifiers according to the feature parameters from high to low; and selecting the first target identifier from the aligned identifiers, where the A target identifier includes an identifier of the top N bits in the aligned identifiers; and the first target identifier whose value of the feature parameter is greater than or equal to a preset value is obtained from the initial identifier.
- the processor 201 is further configured to: perform: matching the first target identifier with a preset target identifier; and determining, if the first target identifier and the preset target identifier are successfully matched, The first target identifier is a required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
- the processor 201 is further configured to: determine whether the first target identifier and the preset target identifier include the same identifier greater than or equal to a preset number; and determine the first target identifier and the If the preset target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
- the processor 201 is further configured to: obtain an identifier corresponding to an account that is included in the multiple data sources, and randomly obtain, in addition to the first target identifier, an identifier corresponding to an account that is included in the multiple data sources. And the identifier of the second target identifier is obtained, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
- the processor 201 is further configured to: perform a training prediction model according to the first target identifier and the second target identifier; and obtain, according to the prediction model, an identifier to be pushed from an identifier included by the multiple data sources.
- the identifier to be pushed is pushed; the to-be-pushed resource is pushed to the to-be-pushed identifier.
- An embodiment of the present invention provides a solution for obtaining an identifier.
- the feature word is obtained from the identifier, wherein the feature information is used to represent the feature of the predetermined operation; the feature parameter of the initial identifier is determined according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is a frequency used to indicate that an account in the target data source performs a predetermined operation, the feature parameter is used to indicate a frequency at which the initial identifier performs a predetermined operation, and the first target identifier is obtained from the initial identifier, where the first target identifier is a characteristic parameter in the initial identifier A collection of identities that are higher than the preset
- the account corresponding to the identifier and the predetermined operation performed by the account are recorded, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the logo size is avoided from a single user log.
- the initial identifier is initially filtered according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to indicate the initial identifier execution.
- the frequency of the predetermined operation is then obtained from the initial identifier, the first target flag whose feature parameter is higher than the preset parameter, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving the acquisition.
- the accuracy of the identification of the training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
- FIG. 11 is merely illustrative, and the electronic device can be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (MID). Terminal equipment such as PAD.
- FIG. 11 does not limit the structure of the above electronic device.
- the electronic device may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 11, or have a different configuration than that shown in FIG.
- Embodiments of the present invention also provide a storage medium.
- the foregoing storage medium may be located in at least one of the plurality of network devices in the network.
- the storage medium is arranged to store program code for performing the following steps:
- the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, wherein the target data source included in the plurality of data sources records an account and a predetermined operation performed by the account;
- S4 Acquire a first target identifier from the initial identifier, where the first target identifier is a set of identifiers in the initial identifier that have a feature parameter higher than a preset parameter.
- the storage medium is further arranged to store program code for performing the following steps:
- S1 acquiring a first feature word and a second feature word, where the preset feature word includes a first feature word and a second feature word;
- the initial identifier is obtained from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
- the storage medium is further configured to store program code for performing the following steps: obtaining a preset weight, wherein a larger value of the preset weight indicates that the account in the target data source has a higher frequency of performing the predetermined operation;
- the time information is used to obtain the time information and the frequency information, wherein the time information is used to indicate the time when the predetermined operation is performed, the frequency information is used to indicate the frequency at which the identification performs the predetermined operation, and the characteristic parameter is determined according to the preset weight, the time information, and the frequency information.
- the larger the value of the feature parameter is, the higher the frequency at which the initial identifier performs the predetermined operation.
- the storage medium is further configured to store program code for performing the following steps: acquiring a proportion of an account in the target data source that performs a predetermined operation in all accounts included in the target data source; and targeting the target data according to the ratio
- the source allocation preset weight wherein the larger the proportion, the greater the preset weight of the data source allocation; or the number of the same identifier in the first identifier set and the preset identifier set, wherein the first identifier set is the initial identifier a set of identifiers included in a target data source; a preset weight is assigned to the target data source according to a ratio between the quantity and the number identified in the first identifier set, wherein the larger the ratio, the more the preset weight of the data source is assigned Big.
- the storage medium is further configured to store program code for performing the steps of: calculating a product of the initial identification of the corresponding time information and frequency information in each target data source; calculating a weighted sum of the products according to the preset weight, Get the characteristic parameters.
- the storage medium is further configured to store program code for: obtaining information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identification, wherein the information for indicating the feature of the predetermined operation comprises : Feature words, time information and frequency information corresponding to the predetermined operation; storing the feature words, time information and frequency information into a preset format to obtain feature information.
- the storage medium is further configured to store program code for performing the following steps: arranging the initial identifiers according to the feature parameters from high to low; selecting the first target identifier from the aligned identifiers, wherein the first The target identifier includes an identifier of the top N bits in the aligned identifiers; or, the first target identifier whose value of the feature parameter is greater than or equal to the preset value is obtained from the initial identifier.
- the storage medium is further configured to store program code for performing the following steps: matching the first target identifier with the preset target identifier; and determining that the first target identifier matches the preset target identifier, determining The first target identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
- the storage medium is further configured to store program code for performing the following steps: determining whether the first target identifier and the preset target identifier include the same identifier greater than or equal to the preset number; and determining the first target identifier In the case that the preset target identifier includes the same identifier that is greater than or equal to the preset number, it is determined that the first target identifier and the preset target identifier match successfully.
- the storage medium is further configured to store program code for performing the following steps: acquiring an identifier corresponding to the account number included in the plurality of data sources; randomly obtaining the identifier corresponding to the account number included in the plurality of data sources An identifier other than the target identifier obtains a second target identifier, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
- the storage medium is further configured to store program code for performing the following steps: training the prediction model according to the first target identifier and the second target identifier; and selecting the resource to be pushed from the identifiers included in the plurality of data sources according to the prediction model Acquire the to-be-pushed identifier; push the to-be-pushed resource to the to-be-pushed identifier.
- the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
- ROM Read-Only Memory
- RAM Random Access Memory
- a mobile hard disk e.g., a hard disk
- magnetic memory e.g., a hard disk
- the integrated unit in the above embodiment if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in the above-described computer readable storage medium.
- the technical solution of the present invention may be embodied in the form of a software product in the form of a software product, or the whole or part of the technical solution, which is stored in a storage medium, including
- the instructions are used to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the disclosed client may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the present invention records the account corresponding to the identifier and the predetermined operation performed by the account in the target data source, and obtains the identifier corresponding to the predetermined operation, so that the acquisition path of the identifier is more extensive, and the single user log is avoided.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
本申请要求于2017年04月27日提交中国专利局、申请号为201710290180.5、发明名称“标识的获取方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application, filed on Apr. 27, 2017, the disclosure of which is incorporated herein by reference.
本发明涉及计算机领域,具体而言,涉及一种标识的获取方法及装置、存储介质以及电子装置。The present invention relates to the field of computers, and in particular to a method and device for acquiring an identifier, a storage medium, and an electronic device.
在众多推荐领域,比如广告推荐、游戏推荐、视频推荐、新闻推荐等,常常需要将资源投放给某一特定领域用户(定向用户)来提升资源的投放效果,而定向用户的挖掘,通常采用训练预测模型的方式,包括逻辑回归(Logistic Regression,简称为LR)、随机森林(Random Forest,简称为RF)、梯度提升决策树(Gradient Boosting Decision Tree,简称为GBDT)等,而以上任何模型效果好坏的关键,就是在其训练阶段选择训练样本(可以是用户的标识)的准确性,即正负样本选择的是否足够精准。而通常获取真实正样本的方法,是根据客户关系管理(Customer Relationship Management,简称为CRM)获得真实可靠的正样本数据,但往往这类数据规模较小,从而导致训练出的模型特征不够明显,从而影响模型训练效果。In many recommended areas, such as advertising recommendations, game recommendations, video recommendations, news recommendations, etc., it is often necessary to devote resources to a specific domain of users (targeted users) to improve the effectiveness of resource delivery, while targeted user mining, usually using training The way to predict the model, including Logistic Regression (LR), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), etc. The key to bad is to select the accuracy of the training sample (which can be the user's logo) during its training phase, that is, whether the positive and negative sample selection is accurate enough. The method of obtaining a true positive sample is to obtain real and reliable positive sample data according to Customer Relationship Management (CRM), but often the size of such data is small, resulting in the characteristics of the trained model are not obvious enough. Thereby affecting the model training effect.
现有的训练数据样本获取的方式,多数是基于用户行为从单一数据源中获取规则匹配的人群,作为正样本集,负样本集则是从大盘中随机选取;这样单数据源的方式很容易导致样本有偏,同时生成的样本集规模也相对较小,除此之外选出的样本集也不易区分出每个样本的纯净度。Most of the existing training data sample acquisition methods are based on user behaviors to obtain rule matching populations from a single data source. As a positive sample set, negative sample sets are randomly selected from the market; this way the single data source is easy. The sample is biased and the size of the generated sample set is relatively small. In addition, the sample set selected is not easy to distinguish the purity of each sample.
在现有的训练数据样本获取的方式中,如图1所示,根据要挖掘的特 定人群,准备样本表征词和优化规则,在单一的用户行为日志中,通过模式匹配(正则匹配)方式挖掘出带有样本表征词特征的人群,作为其训练数据正样本人群,负样本人群则是在大盘人群中排除正样本人群后,随机选择的样本。这种方式将会导致以下缺陷:首先用户行为日志单一,搜索匹配的人群有限,样本易偏;其次,正样本人群通过模式匹配挖掘后,不足以说明正样本的纯净度和可靠性。以上缺陷导致了现有的训练数据样本获取的方式获取用于训练的标识的准确度较低。In the existing method of acquiring training data samples, as shown in FIG. 1 , according to a specific group to be mined, sample representation words and optimization rules are prepared, and in a single user behavior log, pattern matching (regular matching) is used to mine The population with the characteristics of the sample character is selected as the positive sample population of the training data, and the negative sample population is the randomly selected sample after the positive sample population is excluded from the large population. This approach will lead to the following defects: First, the user behavior log is single, the search matching population is limited, and the sample is biased. Secondly, the positive sample population is not enough to explain the purity and reliability of the positive sample after mining through pattern matching. The above defects lead to the acquisition of existing training data samples in a manner that obtains less accurate identification of the training.
针对上述的问题,目前尚未提出有效的解决方案。In response to the above problems, no effective solution has been proposed yet.
发明内容Summary of the invention
本发明实施例提供了一种标识的获取方法及装置、存储介质以及电子装置,以至少解决相关技术中获取用于训练的标识的准确度低的技术问题。The embodiment of the invention provides a method and a device for acquiring an identifier, a storage medium and an electronic device, so as to at least solve the technical problem that the accuracy of obtaining the identifier for training in the related art is low.
根据本发明实施例的一个方面,提供了一种标识的获取方法,包括:从多个数据源中获取与预定操作对应的标识,其中,在所述多个数据源包括的目标数据源中记录有与所述标识对应的帐号和所述帐号执行过的所述预定操作;根据所述标识的特征信息以及预设特征词从所述标识中获取初始标识,其中,所述特征信息用于表示所述预定操作的特征;根据预设权重以及所述特征信息确定所述初始标识的特征参数,其中,所述预设权重与所述目标数据源对应,所述预设权重用于指示所述目标数据源中的帐号执行所述预定操作的频率,所述特征参数用于指示所述初始标识执行所述预定操作的频率;从所述初始标识中获取第一目标标识,其中,所述第一目标标识是所述初始标识中所述特征参数高于预设参数的标识的集合。According to an aspect of the embodiments of the present invention, a method for obtaining an identifier includes: obtaining an identifier corresponding to a predetermined operation from a plurality of data sources, wherein recording is performed in a target data source included in the plurality of data sources The account corresponding to the identifier and the predetermined operation performed by the account; obtaining an initial identifier from the identifier according to the feature information of the identifier and a preset feature word, wherein the feature information is used to indicate a feature of the predetermined operation; determining a feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate the The frequency of the predetermined operation performed by the account in the target data source, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation; and the first target identifier is obtained from the initial identifier, where the first A target identifier is a set of identifiers in the initial identifier that are higher than a preset parameter.
根据本发明实施例的另一方面,还提供了一种标识的获取装置,包括:第一获取模块,被设置为从多个数据源中获取与预定操作对应的标识,其中,在所述多个数据源包括的目标数据源中记录有与所述标识对应的帐号和所述帐号执行过的所述预定操作;第二获取模块,被设置为根据所述标识的特征信息以及预设特征词从所述标识中获取初始标识,其中,所述特 征信息用于表示所述预定操作的特征;确定模块,被设置为根据预设权重以及所述特征信息确定所述初始标识的特征参数,其中,所述预设权重与所述目标数据源对应,所述预设权重用于指示所述目标数据源中的帐号执行所述预定操作的频率,所述特征参数用于指示所述初始标识执行所述预定操作的频率;第三获取模块,被设置为从所述初始标识中获取第一目标标识,其中,所述第一目标标识是所述初始标识中所述特征参数高于预设参数的标识的集合。According to another aspect of the present invention, an apparatus for acquiring an identifier is provided, including: a first acquiring module, configured to acquire an identifier corresponding to a predetermined operation from a plurality of data sources, wherein The target data source included in the data source includes the account corresponding to the identifier and the predetermined operation performed by the account; the second obtaining module is configured to set the feature information according to the identifier and the preset feature word Obtaining an initial identifier from the identifier, wherein the feature information is used to represent a feature of the predetermined operation; and the determining module is configured to determine a feature parameter of the initial identifier according to the preset weight and the feature information, where The preset weight is used to indicate the frequency at which the account in the target data source performs the predetermined operation, and the feature parameter is used to indicate that the initial identifier is executed. a frequency of the predetermined operation; a third obtaining module configured to acquire a first target identifier from the initial identifier, wherein the first target identifier And is a set of identifiers in the initial identifier that are higher than a preset parameter.
根据本发明实施例的另一方面,还提供了一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行上述标识的获取方法。According to another aspect of the embodiments of the present invention, there is also provided a storage medium, the storage medium comprising a stored program, wherein, when the program is running, controlling a device in which the storage medium is located to perform an acquisition method of the identifier.
在本发明实施例中,从多个数据源中获取与预定操作对应的标识,其中,在多个数据源包括的目标数据源中记录有与标识对应的帐号和帐号执行过的预定操作;根据标识的特征信息以及预设特征词从标识中获取初始标识,其中,特征信息用于表示预定操作的特征;根据预设权重以及特征信息确定初始标识的特征参数,其中,预设权重与目标数据源对应,预设权重用于指示目标数据源中的帐号执行预定操作的频率,特征参数用于指示初始标识执行预定操作的频率;从初始标识中获取第一目标标识,其中,第一目标标识是初始标识中特征参数高于预设参数的标识的集合。也就是说,在目标数据源中记录了标识对应的帐号以及帐号执行过的预定操作,从中获取预定操作对应的标识,使得标识的获取途径更加的广泛,避免了从单一的用户日志获取标识规模较小导致的获取的标识有偏的问题,再根据标识的特征信息以及预设特征词初步地筛选出初始标识,并根据预设权重和特征信息为初始标识确定特征参数来表示出初始标识执行该预定操作的频率,然后从初始标识中获取特征参数高于预设参数的第一目标标志,使得第一目标标识中包括的标识均为执行预定操作频率较高的标识,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。In the embodiment of the present invention, the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, wherein the target data source included in the plurality of data sources records the predetermined operation performed by the account and the account corresponding to the identifier; The identifier information and the preset feature word obtain an initial identifier from the identifier, wherein the feature information is used to represent a feature of the predetermined operation; and the feature parameter of the initial identifier is determined according to the preset weight and the feature information, wherein the preset weight and the target data are Correspondingly, the preset weight is used to indicate the frequency at which the account in the target data source performs the predetermined operation, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation, and the first target identifier is obtained from the initial identifier, where the first target identifier It is a set of identifiers in which the feature parameter is higher than the preset parameter in the initial identifier. That is to say, in the target data source, the account corresponding to the identifier and the predetermined operation performed by the account are recorded, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the logo size is avoided from a single user log. If the identifier of the acquired identifier is biased, the initial identifier is initially filtered according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to indicate the initial identifier execution. The frequency of the predetermined operation is then obtained from the initial identifier, the first target flag whose feature parameter is higher than the preset parameter, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving the acquisition. The accuracy of the identification of the training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是根据相关技术的一种标识的获取方法的示意图;1 is a schematic diagram of an acquisition method of an identifier according to the related art;
图2是根据本发明实施例的一种可选的标识的获取方法的应用环境示意图;2 is a schematic diagram of an application environment of an optional method for acquiring an identifier according to an embodiment of the present invention;
图3是根据本发明实施例的一种可选的标识的获取方法的示意图;FIG. 3 is a schematic diagram of an optional method for acquiring an identifier according to an embodiment of the present invention; FIG.
图4是根据本发明实施例的一种可选的标识的获取装置的示意图一;4 is a schematic diagram 1 of an optional identification acquiring device according to an embodiment of the present invention;
图5是根据本发明实施例的一种可选的标识的获取装置的示意图二;FIG. 5 is a second schematic diagram of an apparatus for acquiring an identifier according to an embodiment of the present invention; FIG.
图6是根据本发明实施例的一种可选的标识的获取装置的示意图三;FIG. 6 is a third schematic diagram of an optional identification acquiring device according to an embodiment of the present invention; FIG.
图7是根据本发明实施例的一种可选的标识的获取装置的示意图四;FIG. 7 is a schematic diagram 4 of an apparatus for acquiring an optional identifier according to an embodiment of the present invention; FIG.
图8是根据本发明实施例的一种可选的标识的获取装置的示意图五;FIG. 8 is a schematic diagram 5 of an optional identifier acquiring apparatus according to an embodiment of the present invention; FIG.
图9是根据本发明实施例的一种可选的标识的获取装置的示意图六;9 is a schematic diagram 6 of an optional identification acquiring device according to an embodiment of the present invention;
图10是根据本发明实施例的一种可选的标识的获取方法的应用场景示意图;以及FIG. 10 is a schematic diagram of an application scenario of an optional method for acquiring an identifier according to an embodiment of the present invention;
图11是根据本发明实施例的一种电子装置的结构示意图。FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is an embodiment of the invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged where appropriate, so that the embodiments of the invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.
在本发明实施例中,提供了一种上述标识的获取方法的实施例。作为一种可选的实施方式,该标识的获取方法可以但不限于应用于如图2所示的应用环境中,服务器202,被设置为从多个数据源中获取与预定操作对应的标识,根据标识的特征信息以及预设特征词从标识中获取初始标识,根据标识的特征信息以及预设特征词从标识中获取初始标识,从初始标识中获取第一目标标识;其中,在多个数据源包括的目标数据源中记录有与标识对应的帐号和帐号执行过的操作;特征信息用于表示预定操作的特征;预设权重与目标数据源对应,预设权重用于指示目标数据源中的帐号执行预定操作的频率,特征参数用于指示初始标识执行预定操作的频率;第一目标标识是初始标识中特征参数高于预设参数的标识的集合。In the embodiment of the present invention, an embodiment of a method for acquiring the foregoing identifier is provided. As an optional implementation manner, the method for obtaining the identifier may be, but is not limited to, being applied to an application environment as shown in FIG. 2, and the
在本实施例中,在目标数据源中记录了标识对应的帐号以及帐号执行过的操作,服务器202从中获取预定操作对应的标识,使得标识的获取途径更加的广泛,避免了从单一的用户日志获取标识规模较小导致的获取的标识有偏的问题,再根据标识的特征信息以及预设特征词初步地筛选出初始标识,并根据预设权重和特征信息为初始标识确定特征参数来表示出初始标识执行该预定操作的频率,然后从初始标识中获取特征参数高于预设参数的第一目标标志,使得第一目标标识中包括的标识均为执行预定操作频率较高的标识,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。In this embodiment, the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the
可选地,在本实施例中,服务器202被设置为:获取第一特征词与第二特征词,其中,预设特征词包括第一特征词和第二特征词;从标识中获取初始标识,其中,初始标识对应的特征信息中携带第一特征词且未携带第二特征词。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202被设置为:获取预设权重,其中,预设权重的值越大表示目标数据源中的帐号执行预定操作的频率越高;从特征信息中获取时间信息和频次信息,其中,时间信息用于指示标识执行预定操作的时间,频次信息用于指示标识执行预定操作的频次;根据预设权重、时间信息以及频次信息确定特征参数,其中,特征参数的值越大表示初始标识执行预定操作的频率越高。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202被设置为:获取目标数据源中执行预定操作的帐号在目标数据源中包括的全部帐号中所占的比例;根据比例为目标数据源分配预设权重,其中,比例越大的数据源分配的预设权重越大;或者,获取第一标识集合与预设标识集合中相同标识的数量,其中,第一标识集合是初始标识中在一个目标数据源中包括的标识的集合;根据数量与第一标识集合中标识的数量之间的比值为目标数据源分配预设权重,其中,比值越大的数据源分配的预设权重越大。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202被设置为:计算初始标识在每个目标数据源中对应的时间信息和频次信息的乘积;根据预设权重计算乘积的加权和,得到特征参数。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202被设置为:从标识对应的预定操作中获取用于表示预定操作的特征的信息,其中,用于表示预定操作的特征的信息包括:预定操作对应的特征词,时间信息和频次信息;将特征词、时间信息以及频次信息存储为预设格式,得到特征信息。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202被设置为:将初始标识按照特征参数从高到低进行排列;从排列后的标识中选择出第一目标标识,其中,第一目标标识包括在排列后的标识中排在前N位的标识;或者,从初始标 识中获取特征参数的值大于或者等于预设值的第一目标标识。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202被设置为:将第一目标标识与预设目标标识进行匹配;在第一目标标识与预设目标标识匹配成功的情况下,确定出第一目标标识为所需的标识;在第一目标标识与预设目标标识匹配不成功的情况下,重新获取第一目标标识。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202还被设置为:判断第一目标标识与预设目标标识中是否包括大于或者等于预设数量的相同标识;在判断出第一目标标识与预设目标标识中包括大于或者等于预设数量的相同标识的情况下,确定第一目标标识与预设目标标识匹配成功。Optionally, in this embodiment, the
可选地,在本实施例中,服务器202还被设置为:获取多个数据源中包括的帐号对应的标识;从多个数据源中包括的帐号对应的标识中随机获取除第一目标标识之外的标识,得到第二目标标识,其中,第二目标标识中包括的标识的数量与第一目标标识中包括的标识的数量相同。Optionally, in this embodiment, the
可选地,在本实施例描述的应用环境中,还可以包括客户端,客户端通过网络与服务器202连接,服务器202还被设置为:根据第一目标标识和第二目标标识训练预测模型;根据预测模型从多个数据源包括的标识中为待推送资源获取待推送标识;向待推送标识对应的帐号所使用的客户端推送待推送资源。Optionally, in the application environment described in this embodiment, the client may further include a client connected to the
可选地,在本实施例中,上述客户端可以包括但不限于以下至少之一:手机、平板电脑、笔记本电脑、台式PC机、数字电视及其他进行区域共享的硬件设备。上述网络可以包括但不限于以下至少之一:广域网、城域网、局域网。上述只是一种示例,本实施例对此不做任何限定。Optionally, in this embodiment, the foregoing client may include, but is not limited to, at least one of the following: a mobile phone, a tablet computer, a notebook computer, a desktop PC, a digital television, and other hardware devices for area sharing. The above network may include, but is not limited to, at least one of the following: a wide area network, a metropolitan area network, and a local area network. The above is only an example, and the embodiment does not limit this.
根据本发明实施例,提供了一种标识的获取方法,如图3所示,该方法包括:According to an embodiment of the present invention, a method for obtaining an identifier is provided. As shown in FIG. 3, the method includes:
S302,从多个数据源中获取与预定操作对应的标识,其中,在多个数据源包括的目标数据源中记录有与标识对应的帐号和帐号执行过的操作;S302, the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, where the account and the account corresponding to the identifier are recorded in the target data source included in the plurality of data sources;
S304,根据标识的特征信息以及预设特征词从标识中获取初始标识,其中,特征信息用于表示预定操作的特征;S304. Acquire an initial identifier from the identifier according to the identifier information and the preset feature word, where the feature information is used to indicate a feature of the predetermined operation.
S306,根据预设权重以及特征信息确定初始标识的特征参数,其中,预设权重与目标数据源对应,预设权重用于指示目标数据源中的帐号执行预定操作的频率,特征参数用于指示初始标识执行预定操作的频率;S306. Determine a feature parameter of the initial identifier according to the preset weight and the feature information, where the preset weight is corresponding to the target data source, and the preset weight is used to indicate a frequency at which the account in the target data source performs a predetermined operation, and the feature parameter is used to indicate Initially identifying the frequency at which the predetermined operation is performed;
S308,从初始标识中获取第一目标标识,其中,第一目标标识是初始标识中特征参数高于预设参数的标识的集合。S308. Acquire a first target identifier from the initial identifier, where the first target identifier is a set of identifiers in the initial identifier that have a feature parameter higher than a preset parameter.
可选地,在本实施例中,上述标识的获取方法可以但不限于应用于获取标识样本进行模型训练,利用训练结果为客户端推送资源的场景中。其中,上述客户端可以但不限于为各种类型的软件,例如,搜索软件、社交软件、即时通讯软件、新闻资讯软件、游戏软件、购物软件等。可选的,可以但不限于应用于在上述获取标识样本进行模型训练,利用训练结果为购物软件的客户端推送资源的场景中,或还可以但不限于应用于在上述获取标识样本进行模型训练,利用训练结果为搜索软件的客户端推送资源的场景中,以实现标识样本的获取。上述仅是一种示例,本实施例中对此不做任何限定。Optionally, in this embodiment, the method for acquiring the identifier may be, but is not limited to, being applied to the method of acquiring the identifier sample for model training, and using the training result to push the resource for the client. The above client may be, but not limited to, various types of software, such as search software, social software, instant messaging software, news information software, game software, shopping software, and the like. Optionally, it may be, but is not limited to, being applied to a scenario in which the foregoing identification sample is used for model training, and the training result is used by the client of the shopping software to push resources, or may be, but not limited to, applied to the model training in the above-mentioned acquisition identification sample. The training result is used to push the resource of the client of the search software to realize the acquisition of the identification sample. The above is only an example, and is not limited in this embodiment.
可选地,在本实施例中,多个数据源可以是各种平台、软件、网站、应用程序等。例如:社交应用、搜索引擎、电商网站、广告平台等。Optionally, in this embodiment, the multiple data sources may be various platforms, software, websites, applications, and the like. For example: social applications, search engines, e-commerce websites, advertising platforms, etc.
可选地,在本实施例中,标识在不同数据源中可以对应不同的帐号。举例来说,一个用户可能在多个应用上都注册了帐号,例如:在社交平台上注册了帐号A,在购物网站上注册了帐号B,在即时通讯应用上注册了帐号C,该用户可以将上述平台上的三个帐号关联起来,那么,上述三个帐号A、B、C就可以对应同一个标识用来唯一标识该用户。Optionally, in this embodiment, the identifiers may correspond to different account accounts in different data sources. For example, a user may have registered an account on multiple applications, for example, an account A is registered on the social platform, an account B is registered on the shopping website, and an account C is registered on the instant messaging application, and the user can If the three accounts on the above platform are associated, the three accounts A, B, and C can be used to uniquely identify the user.
可选地,在本实施例中,上述目标数据源中可以包括一个或者多个数据源。也就是说,数据源中记录了标识对应的该数据源中的账号,以及该帐号执行过的操作。与预定操作对应的标识可能记录在多个数据源中一个 数据源里,还可能记录在多个数据源中的几个数据源里。Optionally, in this embodiment, one or more data sources may be included in the target data source. That is to say, the account in the data source corresponding to the identifier is recorded in the data source, and the operation performed by the account. The identifier corresponding to the predetermined operation may be recorded in one of the plurality of data sources, or may be recorded in several of the plurality of data sources.
可选地,在本实施例中,预定操作可以是标识执行过的某个行为或者用于表征该行为的词组。例如:如果要挖掘的用户是购买母婴类产品的用户,那么预定操作可以是“点击带有奶粉或者纸尿裤的词条”,或者“奶粉”、“纸尿裤”等词组。从多个数据源中获取的与预定操作对应的标识可以首先获取搜索引擎中搜索过“奶粉”、“纸尿裤”的帐号,购物网站中购买过奶粉或者纸尿裤的帐号,即时通讯软件中发送过带有“奶粉”、“纸尿裤”等词组的消息的帐号以及在多个数据源中点击过带有奶粉或者纸尿裤的词条的帐号,再获取上述这些帐号对应的标识。Optionally, in this embodiment, the predetermined operation may be to identify a certain behavior performed or a phrase for characterizing the behavior. For example, if the user to be excavated is a user who purchases a maternal and child product, the predetermined operation may be "click on the entry with milk powder or diaper", or "milk powder", "diaper" and the like. The identifier corresponding to the predetermined operation obtained from the plurality of data sources may first obtain an account searched for “milk powder” and “diaper” in the search engine, and the account of the purchased milk powder or diaper in the shopping website is sent in the instant messaging software. The account number of the message "milk powder", "diaper" and the like, and the account number of the item with the powdered milk or the diaper are clicked in the multiple data sources, and the corresponding identifiers of the above-mentioned accounts are obtained.
可选地,在本实施例中,初始标识中可以但不限于包括一个或者多个标识。预设特征词可以但不限于是一个或者多个特征词。第一目标标识中可以但不限于包括一个或者多个标识。Optionally, in this embodiment, the initial identifier may include, but is not limited to, including one or more identifiers. The preset feature words may be, but are not limited to, one or more feature words. The first target identifier may include, but is not limited to, including one or more identifiers.
可选地,在本实施例中,预设权重可以用于指示目标数据源中的帐号执行预定操作的频率。换句话说,预设权重可以用来表示目标数据源中的帐号对预定操作的关注程度,这个关注程度可以但不限于用目标数据源中的帐号执行预定操作的频率来表示。在这里,目标数据源中的帐号执行预定操作的频率可以但不限于指目标数据源中的帐号有多少是经常执行该预定操作的(比如:频率超过每天5次执行该预定操作的账号占目标数据源中总账号数的50%)。或者还可以但不限于用目标数据源中的帐号执行预定操作的显著性来表示目标数据源中的帐号执行预定操作的频率。目标数据源中的帐号执行预定操作的显著性可以通过计算初始标识中在目标数据源中记录有帐号的标识在历史数据(比如:上一次推送资源的标识)中所占的比例来确定。Optionally, in this embodiment, the preset weight may be used to indicate a frequency at which an account in the target data source performs a predetermined operation. In other words, the preset weight can be used to indicate the degree of attention of the account in the target data source to the predetermined operation, which can be represented by, but not limited to, the frequency with which the account in the target data source performs the predetermined operation. Here, the frequency at which the account in the target data source performs the predetermined operation may be, but is not limited to, the number of accounts in the target data source that are frequently performed by the predetermined operation (for example, the account whose frequency exceeds 5 times per day performs the predetermined operation. 50% of the total number of accounts in the data source). Alternatively, but not limited to, performing the saliency of the predetermined operation with the account number in the target data source to indicate the frequency at which the account in the target data source performs the predetermined operation. The significance of the account performing the predetermined operation in the target data source can be determined by calculating the proportion of the identifier in the initial identifier in which the account number is recorded in the target data source (for example, the identifier of the last push resource).
可选地,在本实施例中,预设权重可以是根据目标数据源中的帐号执行预定操作的频率为目标数据源设置的,还可以是根据目标数据源中的帐号执行预定操作的频率通过模型训练的方式计算得到的。Optionally, in this embodiment, the preset weight may be set according to a frequency at which the predetermined operation is performed by the account in the target data source, or may be performed according to the frequency of performing the predetermined operation according to the account in the target data source. The model is trained in the way it is calculated.
可见,通过上述步骤,在目标数据源中记录了标识对应的帐号以及帐 号执行过的操作,从中获取预定操作对应的标识,使得标识的获取途径更加的广泛,避免了从单一的用户日志获取标识规模较小导致的获取的标识有偏的问题,再根据标识的特征信息以及预设特征词初步地筛选出初始标识,并根据预设权重和特征信息为初始标识确定特征参数来表示出初始标识执行该预定操作的频率,然后从初始标识中获取特征参数高于预设参数的第一目标标志,使得第一目标标识中包括的标识均为执行预定操作频率较高的标识,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。It can be seen that, through the foregoing steps, the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the identifier corresponding to the predetermined operation is obtained from the target data source, so that the obtaining path of the identifier is more extensive, and the identifier is obtained from a single user log. The acquired identifier has a biased problem, and the initial identifier is preliminarily selected according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to represent the initial identifier. The frequency of the predetermined operation is performed, and then the first target flag whose feature parameter is higher than the preset parameter is obtained from the initial identifier, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving acquisition. The accuracy of the identification for training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
作为一种可选的方案,根据标识的特征信息以及预设特征词从标识中获取初始标识包括:As an optional solution, obtaining the initial identifier from the identifier according to the identifier information and the preset feature word includes:
S1,获取第一特征词与第二特征词,其中,预设特征词包括第一特征词和第二特征词;S1: acquiring a first feature word and a second feature word, where the preset feature word includes a first feature word and a second feature word;
S2,从标识中获取初始标识,其中,初始标识对应的特征信息中携带第一特征词且未携带第二特征词。S2. The initial identifier is obtained from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
可选地,在本实施例中,预设特征词可以但不限于包括第一特征词和第二特征词。预设特征词可以用来表示一类用户人群的特征,其可以包括正向表征词和负向表征词,其中,正向表征词(相当于上述第一特征词),即通俗意义上的关键词(keywords),用来表征特征人群,负向表征词(相当于上述第二特征词),即过滤词(filter_words),负向表征词的作用,在于去噪,即去掉某些多词拼接后的噪声,从而让正向表征词更能表征特征人群。Optionally, in this embodiment, the preset feature words may include, but are not limited to, a first feature word and a second feature word. The preset feature words can be used to represent characteristics of a type of user population, which can include positive representation words and negative representation words, wherein the positive representation words (equivalent to the first characteristic words described above), that is, the key words in the popular sense Words, used to represent feature populations, negative representation words (equivalent to the above second feature words), ie filter words (filter_words), the role of negative representation words, in denoising, that is, to remove some multi-word stitching The latter noise makes the positive representation words more representative of the characteristic population.
通过上述步骤,根据标识的特征信息以及预设特征词中包括的第一特征词和第二特征词从标识中获取初始标识,实现了对标识的初步筛选。Through the above steps, the initial identification of the identifier is achieved by obtaining the initial identifier from the identifier according to the identified feature information and the first feature word and the second feature word included in the preset feature word.
作为一种可选的方案,根据预设权重以及特征信息确定初始标识的特征参数包括:As an optional solution, determining the feature parameters of the initial identifier according to the preset weight and the feature information includes:
S1,获取预设权重,其中,预设权重的值越大表示目标数据源中的帐 号执行预定操作的频率越高;S1: Obtain a preset weight, wherein a larger value of the preset weight indicates that the frequency of performing the predetermined operation on the account in the target data source is higher;
S2,从特征信息中获取时间信息和频次信息,其中,时间信息用于指示标识执行预定操作的时间,频次信息用于指示标识执行预定操作的频次;S2: Obtain time information and frequency information from the feature information, where the time information is used to indicate a time when the performing a predetermined operation is performed, and the frequency information is used to indicate a frequency indicating that the predetermined operation is performed;
S3,根据预设权重、时间信息以及频次信息确定特征参数,其中,特征参数的值越大表示初始标识执行预定操作的频率越高。S3. The feature parameter is determined according to the preset weight, the time information, and the frequency information. The greater the value of the feature parameter, the higher the frequency at which the initial identifier performs the predetermined operation.
可选地,在本实施例中,可以但不限于通过以下方式之一获取预设权重:Optionally, in this embodiment, the preset weight may be obtained by one of the following methods:
方式一,获取目标数据源中执行预定操作的帐号在目标数据源中包括的全部帐号中所占的比例;根据比例为目标数据源分配预设权重,其中,比例越大的数据源分配的预设权重越大。In the first manner, the proportion of the account that performs the predetermined operation in the target data source in all the accounts included in the target data source is obtained; the preset weight is assigned to the target data source according to the ratio, wherein the larger the proportion of the data source is allocated Set the weight more.
例如,目标数据源有三个,分别是目标数据源A、目标数据源B和目标数据源C,在目标数据源A中共有100个帐号,其中有34个帐号执行过预定操作,在目标数据源B中共有200个帐号,其中有25个帐号执行过预定操作,在目标数据源C中共有100个帐号,其中有56个帐号执行过预定操作。那么,获取到目标数据源A、目标数据源B和目标数据源C对应的比例分别为34%、12.5%和56%,根据获取到的比为目标数据源A、目标数据源B和目标数据源C分别分配的预设权重2、1、3。For example, there are three target data sources: target data source A, target data source B, and target data source C. There are 100 accounts in the target data source A, and 34 of them have performed predetermined operations on the target data source. There are 200 accounts in B, of which 25 have performed predetermined operations, and there are 100 accounts in the target data source C, of which 56 have performed predetermined operations. Then, the ratios of the target data source A, the target data source B, and the target data source C are 34%, 12.5%, and 56%, respectively, and the target data source A, the target data source B, and the target data are obtained according to the obtained ratio. Source C assigns preset weights 2, 1, and 3.
方式二,获取第一标识集合与预设标识集合中相同标识的数量,其中,第一标识集合是初始标识中在一个目标数据源中包括的标识的集合;根据数量与第一标识集合中标识的数量之间的比值为目标数据源分配预设权重,其中,比值越大的数据源分配的预设权重越大。Manner 2: Obtain a quantity of the same identifier in the first identifier set and the preset identifier set, where the first identifier set is a set of identifiers included in a target data source in the initial identifier; and the identifier in the first identifier set according to the quantity The ratio between the number of inputs is a preset weight assigned to the target data source, wherein the larger the ratio, the greater the default weight assigned by the data source.
可选地,在本实施例中,预设标识集合可以但不限于指前一次获取的第一目标标识中目标数据源包括的标识,或者是根据前一次推送数据的标识中目标数据源包括的标识。Optionally, in this embodiment, the preset identifier set may be, but is not limited to, an identifier included in the target data source in the first target identifier acquired in the previous time, or is included in the target data source according to the identifier of the previous push data. Logo.
在一个可选的实施方式中,预设标识集合以前一次获取的第一目标标识中目标数据源包括的标识为例,目标数据源A对应的预设标识集合A 中包括40个标识,目标数据源B对应的预设标识集合B中包括30个标识,目标数据源C对应的预设标识集合C中包括40个标识;初始标识中包括的来自目标数据源A、目标数据源B和目标数据源C的标识的数量分别是20、40、40,那么,目标数据源A对应的第一标识集合A中包括20个标识,目标数据源B对应的第一标识集合B中包括40个标识,目标数据源C对应的第一标识集合C中包括40个标识,其中,将第一标识集合A与预设标识集合A中的标识进行匹配,获取到第一标识集合A与预设标识集合A中相同标识的数量为10,将第一标识集合B与预设标识集合B中的标识进行匹配,获取到第一标识集合B与预设标识集合B中相同标识的数量为5,将第一标识集合C与预设标识集合C中的标识进行匹配,获取到第一标识集合C与预设标识集合C中相同标识的数量为20,根据获取到的上述相同标识的数量为目标数据源A、目标数据源B和目标数据源C分别分配的预设权重2、1、3。In an optional implementation manner, the identifier of the target data source in the first target identifier acquired by the preset identifier set is taken as an example, and the preset identifier set A corresponding to the target data source A includes 40 identifiers, and the target data is included. The preset identifier set B corresponding to the source B includes 30 identifiers, and the preset identifier set C corresponding to the target data source C includes 40 identifiers; the initial identifier includes the target data source A, the target data source B, and the target data. The number of the identifiers of the source C is 20, 40, and 40 respectively. Then, the first identifier set A corresponding to the target data source A includes 20 identifiers, and the first identifier set B corresponding to the target data source B includes 40 identifiers. The first identifier set C corresponding to the target data source C includes 40 identifiers, wherein the first identifier set A is matched with the identifier in the preset identifier set A, and the first identifier set A and the preset identifier set A are obtained. The number of the same identifier is 10, and the first identifier set B is matched with the identifier in the preset identifier set B, and the number of the same identifier in the first identifier set B and the preset identifier set B is 5, which will be the first Standard The set C is matched with the identifier in the preset identifier set C, and the number of the same identifiers in the first identifier set C and the preset identifier set C is 20, and the target data source A is obtained according to the obtained number of the same identifiers. The target data source B and the target data source C are respectively assigned preset weights 2, 1, and 3.
可选地,在本实施例中,可以通过以下方式确定特征参数:计算初始标识在每个目标数据源中对应的时间信息和频次信息的乘积,再根据预设权重计算乘积的加权和,得到特征参数。Optionally, in this embodiment, the feature parameter may be determined by: calculating a product of the corresponding time information and frequency information of the initial identifier in each target data source, and then calculating a weighted sum of the products according to the preset weight, to obtain Characteristic Parameters.
在一个可选的实施方式中,可以通过以下公式计算上述特征参数:In an alternative embodiment, the above characteristic parameters can be calculated by the following formula:
其中,source代表的是数据源,这里有n个数据源;weight代表的是每个数据源上的预设权重;time代表的是上述时间信息,可以用abs(用户行为发生时间-当前挖掘时间),即行为时间差的绝对值来表示上述时间信息,其作为用户行为时间衰减参数,即行为发生距离当前时间越近,则其特征参数越大,距离当前时间越远,特征参数越小;action代表上述频次信息,可以用来表示用户行为频次,这里取了sigmoid函数,对其做了归一化处理;其表示行为频次越多,特征参数越高。Among them, source represents the data source, there are n data sources; weight represents the preset weight on each data source; time represents the above time information, you can use abs (user behavior occurrence time - current mining time ), that is, the absolute value of the behavior time difference to represent the above time information, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the feature parameter is, the farther from the current time, the smaller the feature parameter; Representing the above frequency information, it can be used to indicate the frequency of user behavior. Here, the sigmoid function is taken and normalized; the more the behavior frequency is, the higher the feature parameter is.
可见,通过上述步骤,根据预设权重以及特征信息确定初始标识的特 征参数,为初始标识打分,可以用来衡量初始标识执行预定操作的频率,从而从初始标识中筛选出的第一目标标识更能代表预定操作,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。It can be seen that, by using the foregoing steps, the feature identifier of the initial identifier is determined according to the preset weight and the feature information, and the initial identifier is scored, which can be used to measure the frequency at which the initial identifier performs a predetermined operation, so that the first target identifier selected from the initial identifier is further It can represent a predetermined operation, thereby improving the accuracy of obtaining the identification for training, thereby overcoming the problem of low accuracy in obtaining the identification for training in the related art.
作为一种可选的方案,在根据标识的特征信息以及预设特征词从标识中获取初始标识之前,还包括:As an optional solution, before the initial identifier is obtained from the identifier according to the identifier information and the preset feature word, the method further includes:
S1,从标识对应的预定操作中获取用于表示预定操作的特征的信息,其中,用于表示预定操作的特征的信息包括:预定操作对应的特征词,时间信息和频次信息;S1. Acquire information for indicating a feature of the predetermined operation from a predetermined operation corresponding to the identifier, where the information for indicating the feature of the predetermined operation includes: a feature word corresponding to the predetermined operation, time information, and frequency information;
S2,将特征词、时间信息以及频次信息存储为预设格式,得到特征信息。S2: storing feature words, time information, and frequency information into a preset format to obtain feature information.
可见,通过上述步骤,将从标识对应的预定操作中获取的用于表示预定操作的特征的信息整理为预定格式进行存储,从而使得特征词的比对更加快速便捷。It can be seen that, through the above steps, the information for indicating the feature of the predetermined operation acquired from the predetermined operation corresponding to the identification is organized into a predetermined format for storage, so that the comparison of the feature words is faster and more convenient.
作为一种可选的方案,从初始标识中获取第一目标标识包括以下之一:As an optional solution, obtaining the first target identifier from the initial identifier includes one of the following:
S1,将初始标识按照特征参数从高到低进行排列;从排列后的标识中选择出第一目标标识,其中,第一目标标识包括在排列后的标识中排在前N位的标识;S1, the initial identifiers are arranged according to the feature parameters from high to low; the first target identifier is selected from the aligned identifiers, wherein the first target identifier includes the identifiers ranked in the first N digits in the aligned identifiers;
S2,从初始标识中获取特征参数的值大于或者等于预设值的第一目标标识。S2. Acquire, from the initial identifier, a first target identifier whose value of the feature parameter is greater than or equal to a preset value.
可选地,在本实施例中,可以对特征参数进行从高到低的排序,将排在前N位的标识作为特征参数高于预设参数的标识,得到第一目标标识。Optionally, in this embodiment, the feature parameters may be sorted from high to low, and the identifiers ranked in the first N bits are used as identifiers whose feature parameters are higher than the preset parameters, to obtain the first target identifier.
可选地,在本实施例中,可以设定预设值,将值大于或者等于该预设值的特征参数对应的标识作为第一目标标识。Optionally, in this embodiment, the preset value may be set, and the identifier corresponding to the feature parameter whose value is greater than or equal to the preset value is used as the first target identifier.
可见,通过上述步骤,通过对特征参数进行从高到低的排序,或者, 设定预设值的方式获取第一目标标识可以清楚地从初始标识中选择出更能代表预定操作的标识。It can be seen that, through the above steps, obtaining the first target identifier by sorting the feature parameters from high to low, or setting the preset value, can clearly select an identifier that is more representative of the predetermined operation from the initial identifier.
作为一种可选的方案,在从初始标识中获取第一目标标识之后,还包括:As an optional solution, after obtaining the first target identifier from the initial identifier, the method further includes:
S1,将第一目标标识与预设目标标识进行匹配;S1, matching the first target identifier with the preset target identifier;
S2,在第一目标标识与预设目标标识匹配成功的情况下,确定出第一目标标识为所需的标识;在第一目标标识与预设目标标识匹配不成功的情况下,重新获取第一目标标识。S2. If the first target identifier and the preset target identifier are successfully matched, determine that the first target identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, re-acquire the first A target identifier.
可选地,在本实施例中,可以通过以下方式对第一目标标识与预设目标标识进行匹配:判断第一目标标识与预设目标标识中是否包括大于或者等于预设数量的相同标识,并在判断出第一目标标识与预设目标标识中包括大于或者等于预设数量的相同标识的情况下,确定第一目标标识与预设目标标识匹配成功。Optionally, in this embodiment, the first target identifier is matched with the preset target identifier by determining whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number. If it is determined that the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
可选地,在本实施例中,预设目标标识可以是上一次获取的第一目标标识,还可以是预先设定的目标标识。Optionally, in this embodiment, the preset target identifier may be the first target identifier acquired last time, and may also be a preset target identifier.
可选地,在本实施例中,重新获取第一目标标识时可以但不限于通过重新设定预定操作来重新获取预定操作对应的标识从而获取第一目标标识。还可以但不限于通过重新为目标数据源分配预设权重来重新获取第一目标标识。Optionally, in this embodiment, when the first target identifier is re-acquired, the first target identifier may be obtained by, but not limited to, re-acquiring the identifier corresponding to the predetermined operation by resetting the predetermined operation. It is also possible, but not limited to, to reacquire the first target identity by re-assigning the target weight to the target data source.
可见,通过上述步骤,将第一目标标识与预设目标标识进行匹配,如果匹配成功了则可以确定当前获取的第一目标标识满足模型训练的需要,也就是说,第一目标标识是所需的标识。反之,如果匹配不成功,则说明当前获取的第一目标标识不满足模型训练的需要,可以重新获取第一目标标识。It can be seen that, by using the foregoing steps, the first target identifier is matched with the preset target identifier. If the matching succeeds, it may be determined that the currently acquired first target identifier meets the needs of the model training, that is, the first target identifier is required. Logo. On the other hand, if the matching is unsuccessful, it indicates that the currently acquired first target identifier does not meet the needs of the model training, and the first target identifier may be reacquired.
作为一种可选的方案,在从初始标识中获取第一目标标识之后,还包括:As an optional solution, after obtaining the first target identifier from the initial identifier, the method further includes:
S1,获取多个数据源中包括的帐号对应的标识;S1. Acquire an identifier corresponding to an account included in multiple data sources.
S2,从多个数据源中包括的帐号对应的标识中随机获取除第一目标标识之外的标识,得到第二目标标识,其中,第二目标标识中包括的标识的数量与第一目标标识中包括的标识的数量相同。S2, randomly obtaining an identifier other than the first target identifier from the identifiers corresponding to the account numbers included in the multiple data sources, to obtain a second target identifier, where the number of the identifiers included in the second target identifier is the first target identifier The number of tags included in the same is the same.
可选地,在本实施例中,第一目标标识可以作为模型训练的正样本,在获取了第一目标标识之后,还可以从多个数据源的全部的标识中获取第二目标标识作为模型训练的负样本。Optionally, in this embodiment, the first target identifier may be used as a positive sample of the model training, and after acquiring the first target identifier, the second target identifier may be obtained as a model from all identifiers of the multiple data sources. Negative sample of training.
作为一种可选的方案,在从多个数据源中包括的帐号对应的标识中随机获取除第一目标标识之外的标识,得到第二目标标识之后,还包括:As an optional solution, the identifiers other than the first target identifier are randomly obtained from the identifiers corresponding to the account numbers included in the multiple data sources, and after the second target identifier is obtained, the method further includes:
S1,根据第一目标标识和第二目标标识训练预测模型;S1, training a prediction model according to the first target identifier and the second target identifier;
S2,根据预测模型从多个数据源包括的标识中为待推送资源获取待推送标识;S2. Acquire, according to the prediction model, the to-be-pushed identifier for the to-be-pushed resource from the identifiers included in the multiple data sources;
S3,向待推送标识推送待推送资源。S3: Push the to-be-pushed resource to the to-be-pushed identifier.
可选地,在本实施例中,获取的第一目标标识和第二目标标识可以用来进行预测模型的训练,从而使得通过预测模型获取的待推送标识能够更加准确的代表预定操作所指向的人群。从而能够使得推送资源的效率能够更高。Optionally, in this embodiment, the acquired first target identifier and the second target identifier may be used to perform training of the predictive model, so that the to-be-pushed identifier obtained by the predictive model can more accurately represent the pointed operation. crowd. Thereby, the efficiency of pushing resources can be made higher.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理 解,本发明的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention in essence or the contribution to the related art can be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, CD-ROM). The instructions include a number of instructions for causing a terminal device (which may be a cell phone, computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.
根据本发明实施例,还提供了一种用于实施上述标识的获取方法的标识的获取装置,如图4所示,该装置包括:According to an embodiment of the present invention, there is also provided an apparatus for acquiring an identifier of an acquisition method of the foregoing identifier. As shown in FIG. 4, the apparatus includes:
1)第一获取模块42,被设置为从多个数据源中获取与预定操作对应的标识,其中,在多个数据源包括的目标数据源中记录有与标识对应的帐号和帐号执行过的操作;1) The first obtaining module 42 is configured to obtain an identifier corresponding to the predetermined operation from the plurality of data sources, wherein the account and the account corresponding to the identifier are recorded in the target data source included in the plurality of data sources operating;
2)第二获取模块44,被设置为根据标识的特征信息以及预设特征词从标识中获取初始标识,其中,特征信息用于表示预定操作的特征;2) The second obtaining module 44 is configured to obtain an initial identifier from the identifier according to the identified feature information and the preset feature word, where the feature information is used to indicate a feature of the predetermined operation;
3)确定模块46,被设置为根据预设权重以及特征信息确定初始标识的特征参数,其中,预设权重与目标数据源对应,预设权重用于指示目标数据源中的帐号执行预定操作的频率,特征参数用于指示初始标识执行预定操作的频率;The determining module 46 is configured to determine the feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate that the account in the target data source performs the predetermined operation. Frequency, the characteristic parameter is used to indicate the frequency at which the initial identification performs a predetermined operation;
4)第三获取模块48,被设置为从初始标识中获取第一目标标识,其中,第一目标标识是初始标识中特征参数高于预设参数的标识的集合。4) The third obtaining module 48 is configured to obtain the first target identifier from the initial identifier, where the first target identifier is a set of identifiers in the initial identifier whose feature parameters are higher than the preset parameters.
可选地,在本实施例中,上述标识的获取装置可以但不限于应用于获取标识样本进行模型训练,利用训练结果为客户端推送资源的场景中。其中,上述客户端可以但不限于为各种类型的软件,例如,搜索软件、社交软件、即时通讯软件、新闻资讯软件、游戏软件、购物软件等。可选的,可以但不限于应用于在上述获取标识样本进行模型训练,利用训练结果为购物软件的客户端推送资源的场景中,或还可以但不限于应用于在上述获取标识样本进行模型训练,利用训练结果为搜索软件的客户端推送资源的场景中,以实现标识样本的获取。上述仅是一种示例,本实施例中对此不 做任何限定。Optionally, in this embodiment, the acquiring device of the foregoing identifier may be, but is not limited to, being applied to the method of acquiring the identifier sample for model training, and using the training result to push the resource for the client. The above client may be, but not limited to, various types of software, such as search software, social software, instant messaging software, news information software, game software, shopping software, and the like. Optionally, it may be, but is not limited to, being applied to a scenario in which the foregoing identification sample is used for model training, and the training result is used by the client of the shopping software to push resources, or may be, but not limited to, applied to the model training in the above-mentioned acquisition identification sample. The training result is used to push the resource of the client of the search software to realize the acquisition of the identification sample. The above is only an example, and is not limited in this embodiment.
可选地,在本实施例中,多个数据源可以是各种平台、软件、网站、应用程序等。例如:社交应用、搜索引擎、电商网站、广告平台等。Optionally, in this embodiment, the multiple data sources may be various platforms, software, websites, applications, and the like. For example: social applications, search engines, e-commerce websites, advertising platforms, etc.
可选地,在本实施例中,标识在不同数据源中可以对应不同的帐号。举例来说,一个用户可能在多个应用上都注册了帐号,例如:在社交平台上注册了帐号A,在购物网站上注册了帐号B,在即时通讯应用上注册了帐号C,该用户可以将上述平台上的三个帐号关联起来,那么,上述三个帐号A、B、C就可以对应同一个标识用来唯一标识该用户。Optionally, in this embodiment, the identifiers may correspond to different account accounts in different data sources. For example, a user may have registered an account on multiple applications, for example, an account A is registered on the social platform, an account B is registered on the shopping website, and an account C is registered on the instant messaging application, and the user can If the three accounts on the above platform are associated, the three accounts A, B, and C can be used to uniquely identify the user.
可选地,在本实施例中,上述目标数据源中可以包括一个或者多个数据源。也就是说,数据源中记录了标识对应的该数据源中的账号,以及该帐号执行过的操作。与预定操作对应的标识可能记录在多个数据源中一个数据源里,还可能记录在多个数据源中的几个数据源里。Optionally, in this embodiment, one or more data sources may be included in the target data source. That is to say, the account in the data source corresponding to the identifier is recorded in the data source, and the operation performed by the account. The identifier corresponding to the predetermined operation may be recorded in one of the plurality of data sources, and may also be recorded in several of the plurality of data sources.
可选地,在本实施例中,预定操作可以是标识执行过的某个行为或者用于表征该行为的词组。例如:如果要挖掘的用户是购买母婴类产品的用户,那么预定操作可以是“点击带有奶粉或者纸尿裤的词条”,或者“奶粉”、“纸尿裤”等词组。从多个数据源中获取的与预定操作对应的标识可以首先获取搜索引擎中搜索过“奶粉”、“纸尿裤”的帐号,购物网站中购买过奶粉或者纸尿裤的帐号,即时通讯软件中发送过带有“奶粉”、“纸尿裤”等词组的消息的帐号以及在多个数据源中点击过带有奶粉或者纸尿裤的词条的帐号,再获取上述这些帐号对应的标识。Optionally, in this embodiment, the predetermined operation may be to identify a certain behavior performed or a phrase for characterizing the behavior. For example, if the user to be excavated is a user who purchases a maternal and child product, the predetermined operation may be "click on the entry with milk powder or diaper", or "milk powder", "diaper" and the like. The identifier corresponding to the predetermined operation obtained from the plurality of data sources may first obtain an account searched for “milk powder” and “diaper” in the search engine, and the account of the purchased milk powder or diaper in the shopping website is sent in the instant messaging software. The account number of the message "milk powder", "diaper" and the like, and the account number of the item with the powdered milk or the diaper are clicked in the multiple data sources, and the corresponding identifiers of the above-mentioned accounts are obtained.
可选地,在本实施例中,初始标识中可以但不限于包括一个或者多个标识。预设特征词可以但不限于是一个或者多个特征词。第一目标标识中可以但不限于包括一个或者多个标识。Optionally, in this embodiment, the initial identifier may include, but is not limited to, including one or more identifiers. The preset feature words may be, but are not limited to, one or more feature words. The first target identifier may include, but is not limited to, including one or more identifiers.
可选地,在本实施例中,预设权重可以用于指示目标数据源中的帐号执行预定操作的频率。换句话说,预设权重可以用来表示目标数据源中的帐号对预定操作的关注程度,这个关注程度可以但不限于用目标数据源中 的帐号执行预定操作的频率来表示。在这里,目标数据源中的帐号执行预定操作的频率可以但不限于指目标数据源中的帐号有多少是经常执行该预定操作的(比如:频率超过每天5次执行该预定操作的账号占目标数据源中总账号数的50%)。或者还可以但不限于用目标数据源中的帐号执行预定操作的显著性来表示目标数据源中的帐号执行预定操作的频率。目标数据源中的帐号执行预定操作的显著性可以通过计算初始标识中在目标数据源中记录有帐号的标识在历史数据(比如:上一次推送资源的标识)中所占的比例来确定。Optionally, in this embodiment, the preset weight may be used to indicate a frequency at which an account in the target data source performs a predetermined operation. In other words, the preset weight can be used to indicate the degree of attention of the account in the target data source to the predetermined operation, which can be expressed, but not limited to, by the frequency with which the account in the target data source performs the predetermined operation. Here, the frequency at which the account in the target data source performs the predetermined operation may be, but is not limited to, the number of accounts in the target data source that are frequently performed by the predetermined operation (for example, the account whose frequency exceeds 5 times per day performs the predetermined operation. 50% of the total number of accounts in the data source). Alternatively, but not limited to, performing the saliency of the predetermined operation with the account number in the target data source to indicate the frequency at which the account in the target data source performs the predetermined operation. The significance of the account performing the predetermined operation in the target data source can be determined by calculating the proportion of the identifier in the initial identifier in which the account number is recorded in the target data source (for example, the identifier of the last push resource).
可选地,在本实施例中,预设权重可以是根据目标数据源中的帐号执行预定操作的频率为目标数据源设置的,还可以是根据目标数据源中的帐号执行预定操作的频率通过模型训练的方式计算得到的。Optionally, in this embodiment, the preset weight may be set according to a frequency at which the predetermined operation is performed by the account in the target data source, or may be performed according to the frequency of performing the predetermined operation according to the account in the target data source. The model is trained in the way it is calculated.
可见,通过上述装置,在目标数据源中记录了标识对应的帐号以及帐号执行过的操作,从中获取预定操作对应的标识,使得标识的获取途径更加的广泛,避免了从单一的用户日志获取标识规模较小导致的获取的标识有偏的问题,再根据标识的特征信息以及预设特征词初步地筛选出初始标识,并根据预设权重和特征信息为初始标识确定特征参数来表示出初始标识执行该预定操作的频率,然后从初始标识中获取特征参数高于预设参数的第一目标标志,使得第一目标标识中包括的标识均为执行预定操作频率较高的标识,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。It can be seen that, by using the foregoing device, the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the identifier is obtained from a single user log. The acquired identifier has a biased problem, and the initial identifier is preliminarily selected according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to represent the initial identifier. The frequency of the predetermined operation is performed, and then the first target flag whose feature parameter is higher than the preset parameter is obtained from the initial identifier, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving acquisition. The accuracy of the identification for training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
作为一种可选的方案,如图5所示,第二获取模块44包括:As an alternative, as shown in FIG. 5, the second obtaining module 44 includes:
1)第一获取单元52,被设置为获取第一特征词与第二特征词,其中,预设特征词包括第一特征词和第二特征词;1) The first obtaining unit 52 is configured to acquire the first feature word and the second feature word, wherein the preset feature word includes the first feature word and the second feature word;
2)第二获取单元54,被设置为从标识中获取初始标识,其中,初始标识对应的特征信息中携带第一特征词且未携带第二特征词。2) The second obtaining unit 54 is configured to obtain an initial identifier from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
可选地,在本实施例中,预设特征词可以但不限于包括第一特征词和 第二特征词。预设特征词可以用来表示一类用户人群的特征,其可以包括正向表征词和负向表征词,其中,正向表征词(相当于上述第一特征词),即通俗意义上的关键词(keywords),用来表征特征人群,负向表征词(相当于上述第二特征词),即过滤词(filter_words),负向表征词的作用,在于去噪,即去掉某些多词拼接后的噪声,从而让正向表征词更能表征特征人群。Optionally, in this embodiment, the preset feature words may include, but are not limited to, a first feature word and a second feature word. The preset feature words can be used to represent characteristics of a type of user population, which can include positive representation words and negative representation words, wherein the positive representation words (equivalent to the first characteristic words described above), that is, the key words in the popular sense Words, used to represent feature populations, negative representation words (equivalent to the above second feature words), ie filter words (filter_words), the role of negative representation words, in denoising, that is, to remove some multi-word stitching The latter noise makes the positive representation words more representative of the characteristic population.
通过上述装置,根据标识的特征信息以及预设特征词中包括的第一特征词和第二特征词从标识中获取初始标识,实现了对标识的初步筛选。Through the foregoing device, the initial identifier is obtained from the identifier according to the feature information of the identifier and the first feature word and the second feature word included in the preset feature word, thereby implementing preliminary screening of the identifier.
作为一种可选的方案,如图6所示,确定模块46包括:As an alternative, as shown in FIG. 6, the determining module 46 includes:
1)第三获取单元62,被设置为获取预设权重,其中,预设权重的值越大表示目标数据源中的帐号执行预定操作的频率越高;1) The third obtaining unit 62 is configured to acquire a preset weight, wherein a larger value of the preset weight indicates that the frequency of performing the predetermined operation by the account in the target data source is higher;
2)第四获取单元64,被设置为从特征信息中获取时间信息和频次信息,其中,时间信息用于指示标识执行预定操作的时间,频次信息用于指示标识执行预定操作的频次;2) The fourth obtaining unit 64 is configured to obtain the time information and the frequency information from the feature information, wherein the time information is used to indicate the time when the performing the predetermined operation is performed, and the frequency information is used to indicate the frequency of the identification performing the predetermined operation;
3)确定单元66,被设置为根据预设权重、时间信息以及频次信息确定特征参数,其中,特征参数的值越大表示初始标识执行预定操作的频率越高。3) The determining unit 66 is configured to determine the feature parameter according to the preset weight, the time information, and the frequency information, wherein the larger the value of the feature parameter is, the higher the frequency at which the initial identifier performs the predetermined operation.
可选地,在本实施例中,第三获取单元62被设置为以下之一:Optionally, in this embodiment, the third obtaining unit 62 is set to one of the following:
获取目标数据源中执行预定操作的帐号在目标数据源中包括的全部帐号中所占的比例;根据比例为目标数据源分配预设权重,其中,比例越大的数据源分配的预设权重越大;Obtaining a proportion of all accounts included in the target data source in the target data source; assigning a preset weight to the target data source according to the ratio, wherein the larger the proportion of the data source, the more the preset weight is assigned Big;
获取第一标识集合与预设标识集合中相同标识的数量,其中,第一标识集合是初始标识中在一个目标数据源中包括的标识的集合;根据数量与第一标识集合中标识的数量之间的比值为目标数据源分配预设权重,其中,比值越大的数据源分配的预设权重越大。Obtaining, by the first identifier set, a quantity of the same identifier in the preset identifier set, where the first identifier set is a set of identifiers included in a target data source in the initial identifier; and according to the quantity and the quantity identified in the first identifier set The ratio between the two is assigned a preset weight, and the larger the ratio, the larger the default weight assigned by the data source.
例如,目标数据源有三个,分别是目标数据源A、目标数据源B和目标数据源C,在目标数据源A中共有100个帐号,其中有34个帐号执行过预定操作,在目标数据源B中共有200个帐号,其中有25个帐号执行过预定操作,在目标数据源C中共有100个帐号,其中有56个帐号执行过预定操作。那么,获取到目标数据源A、目标数据源B和目标数据源C对应的比例分别为34%、12.5%和56%,根据获取到的比为目标数据源A、目标数据源B和目标数据源C分别分配的预设权重2、1、3。For example, there are three target data sources: target data source A, target data source B, and target data source C. There are 100 accounts in the target data source A, and 34 of them have performed predetermined operations on the target data source. There are 200 accounts in B, of which 25 have performed predetermined operations, and there are 100 accounts in the target data source C, of which 56 have performed predetermined operations. Then, the ratios of the target data source A, the target data source B, and the target data source C are 34%, 12.5%, and 56%, respectively, and the target data source A, the target data source B, and the target data are obtained according to the obtained ratio. Source C assigns preset weights 2, 1, and 3.
可选地,在本实施例中,预设标识集合可以但不限于指前一次获取的第一目标标识中目标数据源包括的标识,或者是根据前一次推送数据的标识中目标数据源包括的标识。Optionally, in this embodiment, the preset identifier set may be, but is not limited to, an identifier included in the target data source in the first target identifier acquired in the previous time, or is included in the target data source according to the identifier of the previous push data. Logo.
在一个可选的实施方式中,预设标识集合以前一次获取的第一目标标识中目标数据源包括的标识为例,目标数据源A对应的预设标识集合A中包括40个标识,目标数据源B对应的预设标识集合B中包括30个标识,目标数据源C对应的预设标识集合C中包括40个标识;初始标识中包括的来自目标数据源A、目标数据源B和目标数据源C的标识的数量分别是20、40、40,那么,目标数据源A对应的第一标识集合A中包括20个标识,目标数据源B对应的第一标识集合B中包括40个标识,目标数据源C对应的第一标识集合C中包括40个标识,其中,将第一标识集合A与预设标识集合A中的标识进行匹配,获取到第一标识集合A与预设标识集合A中相同标识的数量为10,将第一标识集合B与预设标识集合B中的标识进行匹配,获取到第一标识集合B与预设标识集合B中相同标识的数量为5,将第一标识集合C与预设标识集合C中的标识进行匹配,获取到第一标识集合C与预设标识集合C中相同标识的数量为20,根据获取到的上述相同标识的数量为目标数据源A、目标数据源B和目标数据源C分别分配的预设权重2、1、3。In an optional implementation manner, the identifier of the target data source in the first target identifier acquired by the preset identifier set is taken as an example, and the preset identifier set A corresponding to the target data source A includes 40 identifiers, and the target data is included. The preset identifier set B corresponding to the source B includes 30 identifiers, and the preset identifier set C corresponding to the target data source C includes 40 identifiers; the initial identifier includes the target data source A, the target data source B, and the target data. The number of the identifiers of the source C is 20, 40, and 40 respectively. Then, the first identifier set A corresponding to the target data source A includes 20 identifiers, and the first identifier set B corresponding to the target data source B includes 40 identifiers. The first identifier set C corresponding to the target data source C includes 40 identifiers, wherein the first identifier set A is matched with the identifier in the preset identifier set A, and the first identifier set A and the preset identifier set A are obtained. The number of the same identifier is 10, and the first identifier set B is matched with the identifier in the preset identifier set B, and the number of the same identifier in the first identifier set B and the preset identifier set B is 5, which will be the first Standard The set C is matched with the identifier in the preset identifier set C, and the number of the same identifiers in the first identifier set C and the preset identifier set C is 20, and the target data source A is obtained according to the obtained number of the same identifiers. The target data source B and the target data source C are respectively assigned preset weights 2, 1, and 3.
可选地,在本实施例中,第四获取单元64被设置为:计算初始标识在每个目标数据源中对应的时间信息和频次信息的乘积;根据预设权重计 算乘积的加权和,得到特征参数。Optionally, in this embodiment, the fourth obtaining unit 64 is configured to: calculate a product of the initial information and the frequency information corresponding to the initial identifier in each target data source; calculate a weighted sum of the products according to the preset weight, to obtain Characteristic Parameters.
在一个可选的实施方式中,可以通过以下公式计算上述特征参数:In an alternative embodiment, the above characteristic parameters can be calculated by the following formula:
其中,source代表的是数据源,这里有n个数据源;weight代表的是每个数据源上的预设权重;time代表的是上述时间信息,可以用abs(用户行为发生时间-当前挖掘时间),即行为时间差的绝对值来表示上述时间信息,其作为用户行为时间衰减参数,即行为发生距离当前时间越近,则其特征参数越大,距离当前时间越远,特征参数越小;action代表上述频次信息,可以用来表示用户行为频次,这里取了sigmoid函数,对其做了归一化处理;其表示行为频次越多,特征参数越高。Among them, source represents the data source, there are n data sources; weight represents the preset weight on each data source; time represents the above time information, you can use abs (user behavior occurrence time - current mining time ), that is, the absolute value of the behavior time difference to represent the above time information, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the feature parameter is, the farther from the current time, the smaller the feature parameter; Representing the above frequency information, it can be used to indicate the frequency of user behavior. Here, the sigmoid function is taken and normalized; the more the behavior frequency is, the higher the feature parameter is.
可见,通过上述装置,根据预设权重以及特征信息确定初始标识的特征参数,为初始标识打分,可以用来衡量初始标识执行预定操作的频率,从而从初始标识中筛选出的第一目标标识更能代表预定操作,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。It can be seen that, by using the foregoing apparatus, the feature identifier of the initial identifier is determined according to the preset weight and the feature information, and the initial identifier is scored, which can be used to measure the frequency at which the initial identifier performs the predetermined operation, so that the first target identifier selected from the initial identifier is further It can represent a predetermined operation, thereby improving the accuracy of obtaining the identification for training, thereby overcoming the problem of low accuracy in obtaining the identification for training in the related art.
可选地,在本实施例中,该装置还包括:Optionally, in this embodiment, the apparatus further includes:
第六获取模块,被设置为从标识对应的预定操作中获取用于表示预定操作的特征的信息,其中,用于表示预定操作的特征的信息包括:预定操作对应的特征词,时间信息和频次信息;a sixth obtaining module, configured to acquire information for indicating a feature of the predetermined operation from a predetermined operation corresponding to the identifier, wherein the information for indicating the feature of the predetermined operation comprises: a feature word corresponding to the predetermined operation, time information, and frequency information;
存储模块,被设置为将特征词、时间信息以及频次信息存储为预设格式,得到特征信息。The storage module is configured to store the feature words, the time information, and the frequency information into a preset format to obtain the feature information.
可见,通过上述装置,将从标识对应的预定操作中获取的用于表示预定操作的特征的信息整理为预定格式进行存储,从而使得特征词的比对更加快速便捷。It can be seen that, by the above-mentioned means, the information for indicating the feature of the predetermined operation acquired from the predetermined operation corresponding to the identification is sorted into a predetermined format for storage, thereby making the comparison of the feature words more convenient and convenient.
作为一种可选的方案,如图7所示,第三获取模块48包括以下之一:As an alternative, as shown in FIG. 7, the third obtaining module 48 includes one of the following:
1)处理单元72,被设置为将初始标识按照特征参数从高到低进行排列;从排列后的标识中选择出第一目标标识,其中,第一目标标识包括在排列后的标识中排在前N位的标识;1) The processing unit 72 is configured to arrange the initial identifiers according to the feature parameters from high to low; select the first target identifier from the arranged identifiers, wherein the first target identifier is included in the aligned identifiers. The first N digits of the logo;
2)第五获取单元74,被设置为从初始标识中获取特征参数的值大于或者等于预设值的第一目标标识。2) The fifth obtaining
可选地,在本实施例中,可以对特征参数进行从高到低的排序,将排在前N位的标识作为特征参数高于预设参数的标识,得到第一目标标识。Optionally, in this embodiment, the feature parameters may be sorted from high to low, and the identifiers ranked in the first N bits are used as identifiers whose feature parameters are higher than the preset parameters, to obtain the first target identifier.
可选地,在本实施例中,可以设定预设值,将值大于或者等于该预设值的特征参数对应的标识作为第一目标标识。Optionally, in this embodiment, the preset value may be set, and the identifier corresponding to the feature parameter whose value is greater than or equal to the preset value is used as the first target identifier.
可见,通过上述装置,通过对特征参数进行从高到低的排序,或者,设定预设值的方式获取第一目标标识可以清楚地从初始标识中选择出更能代表预定操作的标识。It can be seen that, by using the above device, obtaining the first target identifier by sorting the feature parameters from high to low, or setting the preset value, it is possible to clearly select an identifier that is more representative of the predetermined operation from the initial identifier.
作为一种可选的方案,如图8所示,上述装置还包括:As an alternative, as shown in FIG. 8, the foregoing apparatus further includes:
1)匹配模块82,被设置为将第一目标标识与预设目标标识进行匹配;1) The matching module 82 is configured to match the first target identifier with the preset target identifier;
2)处理模块84,被设置为在第一目标标识与预设目标标识匹配成功的情况下,确定出第一目标标识为所需的标识;在第一目标标识与预设目标标识匹配不成功的情况下,重新获取第一目标标识。2) The processing module 84 is configured to determine that the first target identifier is a required identifier if the first target identifier and the preset target identifier are successfully matched; and the first target identifier and the preset target identifier are not successfully matched. In the case of re-acquiring the first target identifier.
可选地,在本实施例中,匹配模块82被设置为:判断第一目标标识与预设目标标识中是否包括大于或者等于预设数量的相同标识;在判断出第一目标标识与预设目标标识中包括大于或者等于预设数量的相同标识的情况下,确定第一目标标识与预设目标标识匹配成功。Optionally, in the embodiment, the matching module 82 is configured to: determine whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number; and determine the first target identifier and the preset If the target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
可选地,在本实施例中,预设目标标识可以是上一次获取的第一目标标识,还可以是预先设定的目标标识。Optionally, in this embodiment, the preset target identifier may be the first target identifier acquired last time, and may also be a preset target identifier.
可选地,在本实施例中,重新获取第一目标标识时可以但不限于通过 重新设定预定操作来重新获取预定操作对应的标识从而获取第一目标标识。还可以但不限于通过重新为目标数据源分配预设权重来重新获取第一目标标识。Optionally, in this embodiment, when the first target identifier is re-acquired, the first target identifier may be obtained by, but not limited to, re-acquiring the identifier corresponding to the predetermined operation by resetting the predetermined operation. It is also possible, but not limited to, to reacquire the first target identity by re-assigning the target weight to the target data source.
可见,通过上述装置,将第一目标标识与预设目标标识进行匹配,如果匹配成功了则可以确定当前获取的第一目标标识满足模型训练的需要,也就是说,第一目标标识是所需的标识。反之,如果匹配不成功,则说明当前获取的第一目标标识不满足模型训练的需要,可以重新获取第一目标标识。It can be seen that, by using the foregoing device, the first target identifier is matched with the preset target identifier. If the matching succeeds, it may be determined that the currently acquired first target identifier meets the needs of the model training, that is, the first target identifier is required. Logo. On the other hand, if the matching is unsuccessful, it indicates that the currently acquired first target identifier does not meet the needs of the model training, and the first target identifier may be reacquired.
作为一种可选的方案,如图9所示,上述装置还包括:As an alternative, as shown in FIG. 9, the foregoing apparatus further includes:
1)第四获取模块92,被设置为获取多个数据源中包括的帐号对应的标识;1) The fourth obtaining module 92 is configured to acquire an identifier corresponding to the account included in the plurality of data sources;
2)第五获取模块94,被设置为从多个数据源中包括的帐号对应的标识中随机获取除第一目标标识之外的标识,得到第二目标标识,其中,第二目标标识中包括的标识的数量与第一目标标识中包括的标识的数量相同。2) The fifth obtaining module 94 is configured to randomly obtain an identifier other than the first target identifier from the identifier corresponding to the account number included in the plurality of data sources, to obtain a second target identifier, where the second target identifier includes The number of identifiers is the same as the number of identifiers included in the first target identifier.
可选地,在本实施例中,第一目标标识可以作为模型训练的正样本,在获取了第一目标标识之后,还可以从多个数据源的全部的标识中获取第二目标标识作为模型训练的负样本。Optionally, in this embodiment, the first target identifier may be used as a positive sample of the model training, and after acquiring the first target identifier, the second target identifier may be obtained as a model from all identifiers of the multiple data sources. Negative sample of training.
可选地,在本实施例中,上述装置还包括:Optionally, in this embodiment, the foregoing apparatus further includes:
训练模块,被设置为根据第一目标标识和第二目标标识训练预测模型;a training module configured to train the prediction model according to the first target identifier and the second target identifier;
第七获取模块,被设置为根据预测模型从多个数据源包括的标识中为待推送资源获取待推送标识;The seventh obtaining module is configured to obtain, to be pushed, a to-be-pushed identifier for the to-be-pushed resource from the identifiers included in the plurality of data sources according to the prediction model;
推送模块,被设置为向待推送标识推送待推送资源。The push module is configured to push the to-be-pushed resource to the to-be-pushed identifier.
可选地,在本实施例中,获取的第一目标标识和第二目标标识可以用来进行预测模型的训练,从而使得通过预测模型获取的待推送标识能够更 加准确的代表预定操作所指向的人群。从而能够使得推送资源的效率能够更高。Optionally, in this embodiment, the acquired first target identifier and the second target identifier may be used to perform training of the predictive model, so that the to-be-pushed identifier obtained by the predictive model can more accurately represent the pointed operation. crowd. Thereby, the efficiency of pushing resources can be made higher.
本发明实施例的应用环境可以但不限于参照实施例1中的应用环境,本实施例中对此不再赘述。本发明实施例提供了用于实施上述标识的获取方法的一种可选的具体应用示例。The application environment of the embodiment of the present invention may be, but is not limited to, the application environment in the first embodiment, which is not described in this embodiment. An embodiment of the present invention provides an optional specific application example for implementing the foregoing method for obtaining an identifier.
作为一种可选的实施例,上述标识的获取方法可以但不限于应用于如图10所示的对标识进行获取的场景中。多个数据源为服务器提供数据,服务器根据从数据源得到的数据进行第一目标标识和第二目标标识的获取,再根据第一目标标识和第二目标标识进行预测模型的训练,通过训练好的预测模型从全部标识中筛选出待推送资源的标识,将待推送资源推送给筛选出的标识登录的客户端。As an optional embodiment, the method for obtaining the identifier may be, but is not limited to, applied to a scenario for acquiring an identifier as shown in FIG. 10 . The plurality of data sources provide data for the server, and the server obtains the first target identifier and the second target identifier according to the data obtained from the data source, and then performs training on the predictive model according to the first target identifier and the second target identifier, and is trained. The prediction model selects the identifier of the to-be-pushed resource from all the identifiers, and pushes the to-be-pushed resource to the filtered login client.
在一个可选的实施方式中,多个数据源可以包括社交/搜索/电商/广告/移动应用程序(application,简称为app)等领域,以使用标识的用户在社交/搜索/电商/广告/移动app等领域的用户行为作为标识的特征信息,通过文本语义挖掘各个垂直行业上的初选人群;通过第一标识集合与预设标识集合中相同标识的匹配验证目标数据源中历史效果的显著性得到预设权重,并根据预设权重以及频率信息(例如:用户行为频度)和时间信息(例如:时间衰减因子),为初选标识排序;通过选定排在前N位的标识得到第一目标标识,通过第一目标标识与预设目标标识的匹配进行历史效果显著性的交叉验证,可有效选定训练数据的正样本;在大盘活跃人群中减去上述选定的正样本集合,从剩余集合中随机获取相同规模大小的第二目标标识作为负样本集合。从而实现服务器对第一目标标识和第二目标标识的获取。In an optional implementation manner, multiple data sources may include social/search/e-commerce/advertising/mobile application (application, referred to as app) and the like to use the identified user in social/search/e-commerce/ The user behavior in the field of advertising/mobile app is used as the characteristic information of the logo, and the primary selected crowd in each vertical industry is mined through the text semantics; the historical effect in the target data source is verified by the matching of the first identifier set and the same identifier in the preset identifier set. The saliency is given a preset weight and is sorted according to preset weights and frequency information (for example: user behavior frequency) and time information (for example: time decay factor) for the primary selection; by selecting the top N positions The identifier obtains the first target identifier, and the cross-validation of the historical effect is performed by matching the first target identifier with the preset target identifier, and the positive sample of the training data can be effectively selected; and the selected positive is subtracted from the active population of the large disc The sample set randomly acquires a second target identifier of the same size from the remaining set as a negative sample set. Thereby, the server obtains the first target identifier and the second target identifier.
在本实施方式中,通过文本语义特征挖掘获取训练数据正负样本,融合了用户在社交/搜索/电商/广告/移动app等领域的多种用户行为特征,然后通过用户行为频次因子(即上述频次信息)和行为时间衰减因子(即上 述时间信息),以及用户在不同行为上的历史效果验证,给予用户不同的行为权重因子(即上述预设权重),综合以上各要素,给用户做打分(即上述获取的特征参数)并排序,进而可以根据分值排序,有效判定正样本(即上述第一目标标识)的纯净度,并根据需要自由选择排位在前N位的标识作为训练数据正样本。从而解决了用户行为单一,以及正样本纯净度低的问题。In this embodiment, the positive and negative samples of the training data are obtained through text semantic feature mining, and the user's various user behavior characteristics in the social/search/e-commerce/advertising/mobile app domain are integrated, and then the user behavior frequency factor is adopted (ie, The above frequency information) and the behavior time decay factor (ie, the above time information), and the historical effect verification of the user on different behaviors, giving the user different behavior weight factors (ie, the above-mentioned preset weights), synthesizing the above elements, and making the user The scores (that is, the feature parameters obtained above) are sorted, and then the scores can be sorted according to the scores, and the purity of the positive samples (ie, the first target identifier) can be effectively determined, and the markers ranked in the first N positions can be freely selected as training according to needs. Positive sample of data. Thereby solving the problem that the user behavior is single and the purity of the positive sample is low.
在本实施例中,能够融合用户在互联网多种场景的行为特征,挖掘出具体特定表征意义的用户人群对应的标识,并通过校验检测,获得纯净度较高的正负样本。In this embodiment, the behavior characteristics of the user in various scenarios on the Internet can be integrated, and the identifier corresponding to the user population with specific specific representation meanings can be mined, and the positive and negative samples with higher purity can be obtained through verification detection.
为实现上述要求,本实施例中的上述服务器可以包括以下功能模块:To achieve the above requirements, the foregoing server in this embodiment may include the following functional modules:
1)特征表征词收集模块,被设置为根据需要筛选的特定人群对应的标识的特征定义其特征表征词(相当于上述预设特征词),其包括正向表征词(相当于上述第一特征词)和负向表征词(相当于上述第二特征词),其中正向表征词,即通俗意义上的关键词(keywords),负向表征词,即过滤词(filter_words),负向表征词的作用,在于去噪,即去掉某些多词拼接后的噪声,从而让正向表征词更能表征我们的特征人群。1) a feature representation word collection module, configured to define a feature representation word (corresponding to the preset feature word) according to a feature of the identifier corresponding to the specific population that needs to be filtered, which includes a positive representation word (corresponding to the first feature described above) Word) and negative representation words (equivalent to the above second feature words), wherein the positive representation words, that is, the keywords in the popular sense, the negative representation words, ie the filter words (filter_words), the negative representation words The function is to denoise, that is, to remove some of the multi-word spliced noise, so that the positive representation words can better represent our characteristic population.
2)用户多种行为特征融合模块,被设置为通过用户在社交/搜索/电商/广告/移动app等领域的多种行为表述,从中提炼(用户标识-特征表述串-时间信息-频次信息)这几个关键元素。2) The user multiple behavior feature fusion module is set to be refined by the user in various behaviors in the fields of social/search/e-commerce/advertising/mobile app (user identification-character representation string-time information-frequency information) ) These key elements.
3)模式匹配模块,被设置为根据特征表征词收集模块中的特征表征词,在用户多种行为特征融合模块中的用户多种行为数据(用户标识-特征表述串-时间信息-频次信息)中,通过模式匹配方式,去搜索含有正向表征词,但不含有负向表征词的用户标识作为初选标识。3) The pattern matching module is configured to: according to the feature representation words in the feature representation word collection module, the user multiple behavior data in the user multiple behavior feature fusion module (user identification-feature representation string-time information-frequency information) In the pattern matching method, the user identifier containing the positive representation word but not the negative representation word is searched for as the primary selection identifier.
4)用户打分模块,被设置为对模式匹配模块中的初选标识进行打分(即获取特征参数),打分涉及两部分,一部分是对数据源的预设权重(weight)进行计算,一部分是细化到每个数据源内部,计算每个初选标 识的行为分值;其中weight的计算,有两种方式,一是分数据源切分人群包,通过第一标识集合与预设标识集合中相同标识的匹配分别验证单个目标数据源上人群包的显著性,根据显著性的相对值,来分配当前数据源的预设权重;另一种方式,是通过模型训练的方式,比如LR方式来训练得到最终的数据源预设权重,举例来说,首先给每个数据源赋个初始权重值,然后根据初选的小规模正负样本,将每个数据源作为其feature来训练,最终迭代收敛后,模型即可吐出每个数据源的预设权重。4) The user scoring module is set to score the primary selection identifier in the pattern matching module (ie, acquire the feature parameter), and the scoring involves two parts, one part is to calculate the preset weight of the data source, and the part is fine. Within each data source, calculate the behavior score of each primary identifier; where weight is calculated, there are two ways. First, the data source is divided into population packets, and the first identifier set and the preset identifier set are used. The matching of the same identifier respectively verifies the saliency of the crowd package on a single target data source, and assigns the preset weight of the current data source according to the relative value of the saliency; the other way is through the model training method, such as the LR method. The training obtains the final weight of the final data source. For example, first assign an initial weight value to each data source, and then train each data source as its feature according to the primary selected small-scale positive and negative samples, and finally iterate. After convergence, the model can spit out the preset weights of each data source.
数据源预设权重确定后,再根据以下公式给每个初始标识打分:After the data source preset weight is determined, each initial identifier is scored according to the following formula:
其中,source代表的是数据源,这里有n个数据源;weight代表的是每个数据源上的预设权重;time为时间信息,在本是示例中,以abs(用户行为发生时间-当前挖掘时间),即行为时间差的绝对值为例,其作为用户行为时间衰减参数,即行为发生距离当前时间越近,则其分值越大,距离当前时间越远,分值越小;action为频次信息,用于代表用户标识的行为频次,这里取了sigmoid函数,对其做了归一化处理,其表示行为频次越多,分值越高。Where source represents the data source, there are n data sources; weight represents the preset weight on each data source; time is time information, in this example, abs (user behavior occurrence time - current Mining time), that is, the absolute value of the behavior time difference, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the score is, the farther from the current time, the smaller the score; the action is The frequency information is used to represent the frequency of the user identification. Here, the sigmoid function is taken and normalized. The more the behavior frequency is, the higher the score is.
5)正负样本选择模块,被设置为根据用户打分模块中对初选人群的打分排序,选择排在前N位的标识(N值为多少可根据要挖掘的定向标识的不同,以及特征参数在标识中的数量分布,而自由设定),选定后,前N位的标识即为正样本,在大盘活跃用户的标识中排除正样本集合,从剩余集合中选择同正样本1:1规模的人群作为负样本标识。5) The positive and negative sample selection module is set to sort the rankings of the first N people according to the ranking of the primary selection group in the user scoring module, and select the identifiers ranked in the first N digits (the value of the N value may be different according to the orientation identifier to be mined, and the characteristic parameters) In the identification of the quantity distribution, and freely set), after the selection, the first N digits of the identification is a positive sample, the positive sample set is excluded from the identification of the active user of the large disk, and the same positive sample is selected from the remaining sets. The size of the population is identified as a negative sample.
通过文本语义特征挖掘,获取训练数据正负样本,可以有效规避通常意义的种子人群规模过小,而导致模型训练特征不明显的问题;同时由于通过历史效果检验以及用户行为打分,可以用来衡量样本优劣,从而提升了样本选择的准确度。Through the mining of text semantic features, obtaining positive and negative samples of training data can effectively avoid the problem that the seed population of the usual sense is too small, which leads to the problem that the model training characteristics are not obvious. At the same time, it can be used to measure by historical effect test and user behavior scoring. The quality of the sample improves the accuracy of the sample selection.
根据本发明实施例,还提供了一种用于实施上述标识的获取方法的电子装置,如图11所示,该电子装置可以包括:一个或多个(图中仅示出一个)处理器201、存储器203、以及传输装置205,如图11所示,该电子装置还可以包括输入输出设备207。According to an embodiment of the present invention, there is further provided an electronic device for implementing the method for acquiring the above identifier. As shown in FIG. 11, the electronic device may include: one or more (only one shown in the figure) processor 201 The
其中,存储器203可用于存储计算机程序以及模块,如本发明实施例中的标识的获取方法和装置对应的程序指令/模块,处理器201被设置为通过运行存储在存储器203内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的数据加载方法。存储器203可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器203可进一步包括相对于处理器201远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The
上述的传输装置205用于经由一个网络接收或者发送数据,还可以用于处理器与存储器之间的数据传输。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置205包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置205为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。The above described
其中,可选地,存储器203用于存储应用程序。Wherein, the
处理器201可以通过传输装置205调用存储器203存储的应用程序,以执行下述步骤:从多个数据源中获取与预定操作对应的标识,其中,在所述多个数据源包括的目标数据源中记录有与所述标识对应的帐号和所述帐号执行过的所述预定操作;根据所述标识的特征信息以及预设特征词从所述标识中获取初始标识,其中,所述特征信息用于表示所述预定操作的特征;根据预设权重以及所述特征信息确定所述初始标识的特征参数,其中,所述预设权重与所述目标数据源对应,所述预设权重用于指示所述 目标数据源中的帐号执行所述预定操作的频率,所述特征参数用于指示所述初始标识执行所述预定操作的频率;从所述初始标识中获取第一目标标识,其中,所述第一目标标识是所述初始标识中所述特征参数高于预设参数的标识的集合。The processor 201 may call the application stored in the
处理器201还用于执行下述步骤:获取第一特征词与第二特征词,其中,所述预设特征词包括所述第一特征词和所述第二特征词;从所述标识中获取所述初始标识,其中,所述初始标识对应的特征信息中携带所述第一特征词且未携带所述第二特征词。The processor 201 is further configured to: acquire the first feature word and the second feature word, wherein the preset feature word includes the first feature word and the second feature word; from the identifier Obtaining the initial identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
处理器201还用于执行下述步骤:获取所述预设权重,其中,所述预设权重的值越大表示所述目标数据源中的帐号执行所述预定操作的频率越高;从所述特征信息中获取时间信息和频次信息,其中,所述时间信息用于指示所述标识执行所述预定操作的时间,所述频次信息用于指示所述标识执行所述预定操作的频次;根据所述预设权重、所述时间信息以及所述频次信息确定所述特征参数,其中,所述特征参数的值越大表示所述初始标识执行所述预定操作的频率越高。The processor 201 is further configured to: perform the step of: acquiring the preset weight, wherein a greater value of the preset weight indicates that a higher frequency of an account in the target data source performing the predetermined operation is performed; Obtaining time information and frequency information in the feature information, wherein the time information is used to indicate a time when the identifier performs the predetermined operation, and the frequency information is used to indicate the frequency at which the identifier performs the predetermined operation; The preset weight, the time information, and the frequency information determine the feature parameter, wherein a greater value of the feature parameter indicates a higher frequency at which the initial identifier performs the predetermined operation.
处理器201还用于执行下述步骤之一:获取所述目标数据源中执行所述预定操作的帐号在所述目标数据源中包括的全部帐号中所占的比例;根据所述比例为所述目标数据源分配所述预设权重,其中,所述比例越大的数据源分配的所述预设权重越大;获取第一标识集合与预设标识集合中相同标识的数量,其中,所述第一标识集合是所述初始标识中在一个所述目标数据源中包括的标识的集合;根据所述数量与所述第一标识集合中标识的数量之间的比值为所述目标数据源分配所述预设权重,其中,所述比值越大的数据源分配的所述预设权重越大。The processor 201 is further configured to: perform: acquiring a proportion of an account number of the target data source that performs the predetermined operation in all accounts included in the target data source; The target data source allocates the preset weight, wherein the predetermined weight of the data source allocated by the data source is larger; and the number of the same identifier in the first identifier set and the preset identifier set is obtained, where The first identifier set is a set of identifiers included in one of the target data sources in the initial identifier; and a ratio between the quantity and the number of identifiers in the first identifier set is the target data source. The preset weight is allocated, wherein the predetermined weight of the data source allocated by the larger ratio is larger.
处理器201还用于执行下述步骤:计算所述初始标识在每个所述目标数据源中对应的所述时间信息和所述频次信息的乘积;根据所述预设权重计算所述乘积的加权和,得到所述特征参数。The processor 201 is further configured to: calculate a product of the initial identifier corresponding to the time information and the frequency information in each of the target data sources; and calculate the product according to the preset weight The weighted sum is obtained to obtain the characteristic parameter.
处理器201还用于执行下述步骤:从所述标识对应的所述预定操作中 获取用于表示所述预定操作的特征的信息,其中,所述用于表示所述预定操作的特征的信息包括:所述预定操作对应的特征词,所述时间信息和所述频次信息;将所述特征词、所述时间信息以及所述频次信息存储为预设格式,得到所述特征信息。The processor 201 is further configured to: obtain information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identifier, wherein the information for indicating a feature of the predetermined operation And including: the feature word corresponding to the predetermined operation, the time information and the frequency information; storing the feature word, the time information, and the frequency information into a preset format to obtain the feature information.
处理器201还用于执行下述步骤之一:将所述初始标识按照所述特征参数从高到低进行排列;从排列后的标识中选择出所述第一目标标识,其中,所述第一目标标识包括在排列后的标识中排在前N位的标识;从所述初始标识中获取所述特征参数的值大于或者等于预设值的所述第一目标标识。The processor 201 is further configured to perform one of the following steps: arranging the initial identifiers according to the feature parameters from high to low; and selecting the first target identifier from the aligned identifiers, where the A target identifier includes an identifier of the top N bits in the aligned identifiers; and the first target identifier whose value of the feature parameter is greater than or equal to a preset value is obtained from the initial identifier.
处理器201还用于执行下述步骤:将所述第一目标标识与预设目标标识进行匹配;在所述第一目标标识与所述预设目标标识匹配成功的情况下,确定出所述第一目标标识为所需的标识;在所述第一目标标识与所述预设目标标识匹配不成功的情况下,重新获取所述第一目标标识。The processor 201 is further configured to: perform: matching the first target identifier with a preset target identifier; and determining, if the first target identifier and the preset target identifier are successfully matched, The first target identifier is a required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
处理器201还用于执行下述步骤:判断所述第一目标标识与所述预设目标标识中是否包括大于或者等于预设数量的相同标识;在判断出所述第一目标标识与所述预设目标标识中包括大于或者等于预设数量的相同标识的情况下,确定所述第一目标标识与所述预设目标标识匹配成功。The processor 201 is further configured to: determine whether the first target identifier and the preset target identifier include the same identifier greater than or equal to a preset number; and determine the first target identifier and the If the preset target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
处理器201还用于执行下述步骤:获取所述多个数据源中包括的帐号对应的标识;从所述多个数据源中包括的帐号对应的标识中随机获取除所述第一目标标识之外的标识,得到第二目标标识,其中,所述第二目标标识中包括的标识的数量与所述第一目标标识中包括的标识的数量相同。The processor 201 is further configured to: obtain an identifier corresponding to an account that is included in the multiple data sources, and randomly obtain, in addition to the first target identifier, an identifier corresponding to an account that is included in the multiple data sources. And the identifier of the second target identifier is obtained, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
处理器201还用于执行下述步骤:根据所述第一目标标识和所述第二目标标识训练预测模型;根据所述预测模型从所述多个数据源包括的标识中为待推送资源获取待推送标识;向所述待推送标识推送所述待推送资源。The processor 201 is further configured to: perform a training prediction model according to the first target identifier and the second target identifier; and obtain, according to the prediction model, an identifier to be pushed from an identifier included by the multiple data sources. The identifier to be pushed is pushed; the to-be-pushed resource is pushed to the to-be-pushed identifier.
采用本发明实施例,提供了一种标识的获取方法的方案。通过从多个数据源中获取与预定操作对应的标识,其中,在多个数据源包括的目标数 据源中记录有与标识对应的帐号和帐号执行过的预定操作;根据标识的特征信息以及预设特征词从标识中获取初始标识,其中,特征信息用于表示预定操作的特征;根据预设权重以及特征信息确定初始标识的特征参数,其中,预设权重与目标数据源对应,预设权重用于指示目标数据源中的帐号执行预定操作的频率,特征参数用于指示初始标识执行预定操作的频率;从初始标识中获取第一目标标识,其中,第一目标标识是初始标识中特征参数高于预设参数的标识的集合。也就是说,在目标数据源中记录了标识对应的帐号以及帐号执行过的预定操作,从中获取预定操作对应的标识,使得标识的获取途径更加的广泛,避免了从单一的用户日志获取标识规模较小导致的获取的标识有偏的问题,再根据标识的特征信息以及预设特征词初步地筛选出初始标识,并根据预设权重和特征信息为初始标识确定特征参数来表示出初始标识执行该预定操作的频率,然后从初始标识中获取特征参数高于预设参数的第一目标标志,使得第一目标标识中包括的标识均为执行预定操作频率较高的标识,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。An embodiment of the present invention provides a solution for obtaining an identifier. Obtaining an identifier corresponding to the predetermined operation from the plurality of data sources, wherein the target data source included in the plurality of data sources records the predetermined operation performed by the account and the account corresponding to the identifier; and according to the characteristic information of the identifier and the pre- The feature word is obtained from the identifier, wherein the feature information is used to represent the feature of the predetermined operation; the feature parameter of the initial identifier is determined according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is a frequency used to indicate that an account in the target data source performs a predetermined operation, the feature parameter is used to indicate a frequency at which the initial identifier performs a predetermined operation, and the first target identifier is obtained from the initial identifier, where the first target identifier is a characteristic parameter in the initial identifier A collection of identities that are higher than the preset parameters. That is to say, in the target data source, the account corresponding to the identifier and the predetermined operation performed by the account are recorded, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the logo size is avoided from a single user log. If the identifier of the acquired identifier is biased, the initial identifier is initially filtered according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to indicate the initial identifier execution. The frequency of the predetermined operation is then obtained from the initial identifier, the first target flag whose feature parameter is higher than the preset parameter, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving the acquisition. The accuracy of the identification of the training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。For example, the specific examples in this embodiment may refer to the examples described in the foregoing embodiments, and details are not described herein again.
本领域普通技术人员可以理解,图11所示的结构仅为示意,电子装置可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图11其并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图11中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图11所示不同的配置。A person skilled in the art can understand that the structure shown in FIG. 11 is merely illustrative, and the electronic device can be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (MID). Terminal equipment such as PAD. FIG. 11 does not limit the structure of the above electronic device. For example, the electronic device may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 11, or have a different configuration than that shown in FIG.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、 磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps of the foregoing embodiments may be completed by a program to instruct terminal device related hardware, and the program may be stored in a computer readable storage medium, and the storage medium may be Including: flash drive, read-only memory (ROM), random access memory (Random Access Memory, RAM), disk or optical disc.
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以位于网络中的多个网络设备中的至少一个网络设备。Embodiments of the present invention also provide a storage medium. Optionally, in this embodiment, the foregoing storage medium may be located in at least one of the plurality of network devices in the network.
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:Optionally, in the present embodiment, the storage medium is arranged to store program code for performing the following steps:
S1,从多个数据源中获取与预定操作对应的标识,其中,在多个数据源包括的目标数据源中记录有与标识对应的帐号和帐号执行过的预定操作;S1, the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, wherein the target data source included in the plurality of data sources records an account and a predetermined operation performed by the account;
S2,根据标识的特征信息以及预设特征词从标识中获取初始标识,其中,特征信息用于表示预定操作的特征;S2. Acquire an initial identifier from the identifier according to the identifier information and the preset feature word, where the feature information is used to indicate a feature of the predetermined operation;
S3,根据预设权重以及特征信息确定初始标识的特征参数,其中,预设权重与目标数据源对应,预设权重用于指示目标数据源中的帐号执行预定操作的频率,特征参数用于指示初始标识执行预定操作的频率;S3, determining a feature parameter of the initial identifier according to the preset weight and the feature information, where the preset weight corresponds to the target data source, and the preset weight is used to indicate a frequency at which the account in the target data source performs a predetermined operation, and the feature parameter is used to indicate Initially identifying the frequency at which the predetermined operation is performed;
S4,从初始标识中获取第一目标标识,其中,第一目标标识是初始标识中特征参数高于预设参数的标识的集合。S4: Acquire a first target identifier from the initial identifier, where the first target identifier is a set of identifiers in the initial identifier that have a feature parameter higher than a preset parameter.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:Optionally, the storage medium is further arranged to store program code for performing the following steps:
S1,获取第一特征词与第二特征词,其中,预设特征词包括第一特征词和第二特征词;S1: acquiring a first feature word and a second feature word, where the preset feature word includes a first feature word and a second feature word;
S2,从标识中获取初始标识,其中,初始标识对应的特征信息中携带第一特征词且未携带第二特征词。S2. The initial identifier is obtained from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:获取预设权重,其中,预设权重的值越大表示目标数据源中的帐号执行预定操作的频率越高;从特征信息中获取时间信息和频次信息,其中,时间信息用于指示标识执行预定操作的时间,频次信息用于指示标识执行预定操作的频次;根据预设权重、时间信息以及频次信息确定特征参数,其中, 特征参数的值越大表示初始标识执行预定操作的频率越高。Optionally, the storage medium is further configured to store program code for performing the following steps: obtaining a preset weight, wherein a larger value of the preset weight indicates that the account in the target data source has a higher frequency of performing the predetermined operation; The time information is used to obtain the time information and the frequency information, wherein the time information is used to indicate the time when the predetermined operation is performed, the frequency information is used to indicate the frequency at which the identification performs the predetermined operation, and the characteristic parameter is determined according to the preset weight, the time information, and the frequency information. Wherein, the larger the value of the feature parameter is, the higher the frequency at which the initial identifier performs the predetermined operation.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:获取目标数据源中执行预定操作的帐号在目标数据源中包括的全部帐号中所占的比例;根据比例为目标数据源分配预设权重,其中,比例越大的数据源分配的预设权重越大;或者,获取第一标识集合与预设标识集合中相同标识的数量,其中,第一标识集合是初始标识中在一个目标数据源中包括的标识的集合;根据数量与第一标识集合中标识的数量之间的比值为目标数据源分配预设权重,其中,比值越大的数据源分配的预设权重越大。Optionally, the storage medium is further configured to store program code for performing the following steps: acquiring a proportion of an account in the target data source that performs a predetermined operation in all accounts included in the target data source; and targeting the target data according to the ratio The source allocation preset weight, wherein the larger the proportion, the greater the preset weight of the data source allocation; or the number of the same identifier in the first identifier set and the preset identifier set, wherein the first identifier set is the initial identifier a set of identifiers included in a target data source; a preset weight is assigned to the target data source according to a ratio between the quantity and the number identified in the first identifier set, wherein the larger the ratio, the more the preset weight of the data source is assigned Big.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:计算初始标识在每个目标数据源中对应的时间信息和频次信息的乘积;根据预设权重计算乘积的加权和,得到特征参数。Optionally, the storage medium is further configured to store program code for performing the steps of: calculating a product of the initial identification of the corresponding time information and frequency information in each target data source; calculating a weighted sum of the products according to the preset weight, Get the characteristic parameters.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:从标识对应的预定操作中获取用于表示预定操作的特征的信息,其中,用于表示预定操作的特征的信息包括:预定操作对应的特征词,时间信息和频次信息;将特征词、时间信息以及频次信息存储为预设格式,得到特征信息。Optionally, the storage medium is further configured to store program code for: obtaining information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identification, wherein the information for indicating the feature of the predetermined operation comprises : Feature words, time information and frequency information corresponding to the predetermined operation; storing the feature words, time information and frequency information into a preset format to obtain feature information.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:将初始标识按照特征参数从高到低进行排列;从排列后的标识中选择出第一目标标识,其中,第一目标标识包括在排列后的标识中排在前N位的标识;或者,从初始标识中获取特征参数的值大于或者等于预设值的第一目标标识。Optionally, the storage medium is further configured to store program code for performing the following steps: arranging the initial identifiers according to the feature parameters from high to low; selecting the first target identifier from the aligned identifiers, wherein the first The target identifier includes an identifier of the top N bits in the aligned identifiers; or, the first target identifier whose value of the feature parameter is greater than or equal to the preset value is obtained from the initial identifier.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:将第一目标标识与预设目标标识进行匹配;在第一目标标识与预设目标标识匹配成功的情况下,确定出第一目标标识为所需的标识;在第一目标标识与预设目标标识匹配不成功的情况下,重新获取第一目标标识。Optionally, the storage medium is further configured to store program code for performing the following steps: matching the first target identifier with the preset target identifier; and determining that the first target identifier matches the preset target identifier, determining The first target identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:判 断第一目标标识与预设目标标识中是否包括大于或者等于预设数量的相同标识;在判断出第一目标标识与预设目标标识中包括大于或者等于预设数量的相同标识的情况下,确定第一目标标识与预设目标标识匹配成功。Optionally, the storage medium is further configured to store program code for performing the following steps: determining whether the first target identifier and the preset target identifier include the same identifier greater than or equal to the preset number; and determining the first target identifier In the case that the preset target identifier includes the same identifier that is greater than or equal to the preset number, it is determined that the first target identifier and the preset target identifier match successfully.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:获取多个数据源中包括的帐号对应的标识;从多个数据源中包括的帐号对应的标识中随机获取除第一目标标识之外的标识,得到第二目标标识,其中,第二目标标识中包括的标识的数量与第一目标标识中包括的标识的数量相同。Optionally, the storage medium is further configured to store program code for performing the following steps: acquiring an identifier corresponding to the account number included in the plurality of data sources; randomly obtaining the identifier corresponding to the account number included in the plurality of data sources An identifier other than the target identifier obtains a second target identifier, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:根据第一目标标识和第二目标标识训练预测模型;根据预测模型从多个数据源包括的标识中为待推送资源获取待推送标识;向待推送标识推送待推送资源。Optionally, the storage medium is further configured to store program code for performing the following steps: training the prediction model according to the first target identifier and the second target identifier; and selecting the resource to be pushed from the identifiers included in the plurality of data sources according to the prediction model Acquire the to-be-pushed identifier; push the to-be-pushed resource to the to-be-pushed identifier.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment, the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory. A variety of media that can store program code, such as a disc or a disc.
可选地,本实施例中的具体示例可以参考上述实施例1和实施例2中所描述的示例,本实施例在此不再赘述。For example, the specific examples in this embodiment may refer to the examples described in Embodiment 1 and Embodiment 2, and details are not described herein again.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本发明的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施 例所述方法的全部或部分步骤。The integrated unit in the above embodiment, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in the above-described computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product in the form of a software product, or the whole or part of the technical solution, which is stored in a storage medium, including The instructions are used to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.
通过上述描述可知,本发明在目标数据源中记录了标识对应的帐号以及帐号执行过的预定操作,从中获取预定操作对应的标识,使得标识的获取途径更加的广泛,避免了从单一的用户日志获取标识规模较小导致的获取的标识有偏的问题,再根据标识的特征信息以及预设特征词初步地筛选 出初始标识,并根据预设权重和特征信息为初始标识确定特征参数来表示出初始标识执行该预定操作的频率,然后从初始标识中获取特征参数高于预设参数的第一目标标志,使得第一目标标识中包括的标识均为执行预定操作频率较高的标识,从而提高了获取用于训练的标识的准确度,进而克服相关技术中获取用于训练的标识的准确度低的问题。According to the above description, the present invention records the account corresponding to the identifier and the predetermined operation performed by the account in the target data source, and obtains the identifier corresponding to the predetermined operation, so that the acquisition path of the identifier is more extensive, and the single user log is avoided. Obtaining the problem that the acquired identifier is biased due to the small size of the identifier, and then initially screening the initial identifier according to the feature information of the identifier and the preset feature word, and determining the feature parameter according to the preset weight and the feature information to represent the feature identifier Initially identifying the frequency of performing the predetermined operation, and then acquiring, from the initial identifier, the first target flag whose feature parameter is higher than the preset parameter, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving The accuracy of the identification for training is obtained, thereby overcoming the problem of low accuracy in obtaining the identification for training in the related art.
Claims (17)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710290180.5A CN108304426B (en) | 2017-04-27 | 2017-04-27 | Identification obtaining method and device |
| CN201710290180.5 | 2017-04-27 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018196553A1 true WO2018196553A1 (en) | 2018-11-01 |
Family
ID=62872225
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/081337 Ceased WO2018196553A1 (en) | 2017-04-27 | 2018-03-30 | Method and apparatus for obtaining identifier, storage medium, and electronic device |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN108304426B (en) |
| WO (1) | WO2018196553A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110472879A (en) * | 2019-08-20 | 2019-11-19 | 秒针信息技术有限公司 | A kind of appraisal procedure of resource impact, device, electronic equipment and storage medium |
| CN110991296A (en) * | 2019-11-26 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Video annotation method and device, electronic equipment and computer-readable storage medium |
| CN111651657A (en) * | 2020-06-04 | 2020-09-11 | 深圳前海微众银行股份有限公司 | Intelligence monitoring method, apparatus, device and computer-readable storage medium |
| CN112187746A (en) * | 2020-09-15 | 2021-01-05 | 北京明略昭辉科技有限公司 | Method and device for generating equipment identifier |
| CN113780744A (en) * | 2021-08-13 | 2021-12-10 | 唯品会(广州)软件有限公司 | Cargo combination method and device and electronic equipment |
| CN114461699A (en) * | 2022-01-28 | 2022-05-10 | 嘉兴职业技术学院 | Big data user mining method based on cross-border e-commerce platform |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109636433A (en) * | 2018-10-16 | 2019-04-16 | 深圳壹账通智能科技有限公司 | Feeding card identification method, device, equipment and storage medium based on big data analysis |
| CN111967915B (en) * | 2020-08-27 | 2024-11-26 | 北京明略昭辉科技有限公司 | Media file delivery method and device, storage medium and electronic device |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102819804A (en) * | 2011-06-07 | 2012-12-12 | 阿里巴巴集团控股有限公司 | Goods information pushing method and device |
| CN102831234A (en) * | 2012-08-31 | 2012-12-19 | 北京邮电大学 | Personalized news recommendation device and method based on news content and theme feature |
| CN104317865A (en) * | 2014-10-16 | 2015-01-28 | 南京邮电大学 | musical emotion feature matching based social networking search dating method |
| CN105430504A (en) * | 2015-11-27 | 2016-03-23 | 中国科学院深圳先进技术研究院 | Method and System for Family Member Structure Recognition Based on TV Watching Log Mining |
| CN106126592A (en) * | 2016-06-20 | 2016-11-16 | 北京小米移动软件有限公司 | The processing method and processing device of search data |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20120052683A (en) * | 2010-11-16 | 2012-05-24 | 한국전자통신연구원 | Context sharing apparatus and method for providing intelligent service |
| CN103593368A (en) * | 2012-08-16 | 2014-02-19 | 深圳市世纪光速信息技术有限公司 | Method, server, terminal and system for selecting data sources |
| CN104156366B (en) * | 2013-05-13 | 2017-11-21 | 中国移动通信集团浙江有限公司 | A kind of method applied to mobile terminal recommendation network and the webserver |
| CN104090888B (en) * | 2013-12-10 | 2016-05-11 | 深圳市腾讯计算机系统有限公司 | A kind of analytical method of user behavior data and device |
-
2017
- 2017-04-27 CN CN201710290180.5A patent/CN108304426B/en active Active
-
2018
- 2018-03-30 WO PCT/CN2018/081337 patent/WO2018196553A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102819804A (en) * | 2011-06-07 | 2012-12-12 | 阿里巴巴集团控股有限公司 | Goods information pushing method and device |
| CN102831234A (en) * | 2012-08-31 | 2012-12-19 | 北京邮电大学 | Personalized news recommendation device and method based on news content and theme feature |
| CN104317865A (en) * | 2014-10-16 | 2015-01-28 | 南京邮电大学 | musical emotion feature matching based social networking search dating method |
| CN105430504A (en) * | 2015-11-27 | 2016-03-23 | 中国科学院深圳先进技术研究院 | Method and System for Family Member Structure Recognition Based on TV Watching Log Mining |
| CN106126592A (en) * | 2016-06-20 | 2016-11-16 | 北京小米移动软件有限公司 | The processing method and processing device of search data |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110472879A (en) * | 2019-08-20 | 2019-11-19 | 秒针信息技术有限公司 | A kind of appraisal procedure of resource impact, device, electronic equipment and storage medium |
| CN110991296A (en) * | 2019-11-26 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Video annotation method and device, electronic equipment and computer-readable storage medium |
| CN110991296B (en) * | 2019-11-26 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Video annotation method and device, electronic equipment and computer-readable storage medium |
| CN111651657A (en) * | 2020-06-04 | 2020-09-11 | 深圳前海微众银行股份有限公司 | Intelligence monitoring method, apparatus, device and computer-readable storage medium |
| CN111651657B (en) * | 2020-06-04 | 2024-05-24 | 深圳前海微众银行股份有限公司 | Information monitoring method, device, equipment and computer readable storage medium |
| CN112187746A (en) * | 2020-09-15 | 2021-01-05 | 北京明略昭辉科技有限公司 | Method and device for generating equipment identifier |
| CN113780744A (en) * | 2021-08-13 | 2021-12-10 | 唯品会(广州)软件有限公司 | Cargo combination method and device and electronic equipment |
| CN113780744B (en) * | 2021-08-13 | 2023-12-29 | 唯品会(广州)软件有限公司 | Goods combination method and device and electronic equipment |
| CN114461699A (en) * | 2022-01-28 | 2022-05-10 | 嘉兴职业技术学院 | Big data user mining method based on cross-border e-commerce platform |
| CN114461699B (en) * | 2022-01-28 | 2024-06-04 | 嘉兴职业技术学院 | A big data user mining method based on cross-border e-commerce platform |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108304426A (en) | 2018-07-20 |
| CN108304426B (en) | 2021-12-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2018196553A1 (en) | Method and apparatus for obtaining identifier, storage medium, and electronic device | |
| US11941912B2 (en) | Image scoring and identification based on facial feature descriptors | |
| CN112818224B (en) | Information recommendation method and device, electronic equipment and readable storage medium | |
| CN112101994B (en) | Member rights management method, apparatus, computer device, and readable storage medium | |
| CN110210882B (en) | Promotional position matching method and device, promotional information display method and device | |
| US20140095308A1 (en) | Advertisement distribution apparatus and advertisement distribution method | |
| CN113312512B (en) | Training method, recommending device, electronic equipment and storage medium | |
| US9704171B2 (en) | Methods and systems for quantifying and tracking software application quality | |
| CN113383362B (en) | User identification method and related product | |
| US20160171589A1 (en) | Personalized application recommendations | |
| CN110727868B (en) | Object recommendation method, device and computer-readable storage medium | |
| WO2022252363A1 (en) | Data processing method, computer device and readable storage medium | |
| CN103516697B (en) | Network information push method and its system | |
| WO2021027595A1 (en) | User portrait generation method and apparatus, computer device, and computer-readable storage medium | |
| WO2018188378A1 (en) | Method and device for tagging label for application, terminal and computer readable storage medium | |
| CN113190746B (en) | Recommended model evaluation methods, devices and electronic equipment | |
| CN108985048B (en) | Simulator identification method and related device | |
| CN113837318A (en) | Determination method and device, electronic device and storage medium for flow determination scheme | |
| CN107562432B (en) | Information processing methods and related products | |
| CN112632140A (en) | Course recommendation method, device, equipment and storage medium | |
| CN113505272B (en) | Control method and device based on behavior habit, electronic equipment and storage medium | |
| US20110302174A1 (en) | Crowd-sourcing for gap filling in social networks | |
| CN105991583A (en) | Game application recommendation method, application server, terminal and system | |
| CN111027065B (en) | Leucavirus identification method and device, electronic equipment and storage medium | |
| CN105096161B (en) | It is a kind of enter row information displaying method and apparatus |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18791810 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18791810 Country of ref document: EP Kind code of ref document: A1 |