[go: up one dir, main page]

WO2018196553A1 - Procédé et appareil d'obtention d'identifiant, support de stockage et dispositif électronique - Google Patents

Procédé et appareil d'obtention d'identifiant, support de stockage et dispositif électronique Download PDF

Info

Publication number
WO2018196553A1
WO2018196553A1 PCT/CN2018/081337 CN2018081337W WO2018196553A1 WO 2018196553 A1 WO2018196553 A1 WO 2018196553A1 CN 2018081337 W CN2018081337 W CN 2018081337W WO 2018196553 A1 WO2018196553 A1 WO 2018196553A1
Authority
WO
WIPO (PCT)
Prior art keywords
identifier
target
feature
preset
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/081337
Other languages
English (en)
Chinese (zh)
Inventor
袁小燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of WO2018196553A1 publication Critical patent/WO2018196553A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Recommending goods or services

Definitions

  • the present invention relates to the field of computers, and in particular to a method and device for acquiring an identifier, a storage medium, and an electronic device.
  • sample representation words and optimization rules are prepared, and in a single user behavior log, pattern matching (regular matching) is used to mine
  • the population with the characteristics of the sample character is selected as the positive sample population of the training data, and the negative sample population is the randomly selected sample after the positive sample population is excluded from the large population.
  • pattern matching regular matching
  • the user behavior log is single, the search matching population is limited, and the sample is biased.
  • the positive sample population is not enough to explain the purity and reliability of the positive sample after mining through pattern matching.
  • the above defects lead to the acquisition of existing training data samples in a manner that obtains less accurate identification of the training.
  • the embodiment of the invention provides a method and a device for acquiring an identifier, a storage medium and an electronic device, so as to at least solve the technical problem that the accuracy of obtaining the identifier for training in the related art is low.
  • a method for obtaining an identifier includes: obtaining an identifier corresponding to a predetermined operation from a plurality of data sources, wherein recording is performed in a target data source included in the plurality of data sources The account corresponding to the identifier and the predetermined operation performed by the account; obtaining an initial identifier from the identifier according to the feature information of the identifier and a preset feature word, wherein the feature information is used to indicate a feature of the predetermined operation; determining a feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate the The frequency of the predetermined operation performed by the account in the target data source, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation; and the first target identifier is obtained from the initial identifier, where the first A target identifier is a set of identifiers in
  • an apparatus for acquiring an identifier including: a first acquiring module, configured to acquire an identifier corresponding to a predetermined operation from a plurality of data sources, wherein The target data source included in the data source includes the account corresponding to the identifier and the predetermined operation performed by the account; the second obtaining module is configured to set the feature information according to the identifier and the preset feature word Obtaining an initial identifier from the identifier, wherein the feature information is used to represent a feature of the predetermined operation; and the determining module is configured to determine a feature parameter of the initial identifier according to the preset weight and the feature information, where The preset weight is used to indicate the frequency at which the account in the target data source performs the predetermined operation, and the feature parameter is used to indicate that the initial identifier is executed.
  • a third obtaining module configured to acquire a first target identifier from the initial identifier, wherein the first target identifier And is a set of identifiers in the initial identifier that are higher than a preset parameter.
  • a storage medium comprising a stored program, wherein, when the program is running, controlling a device in which the storage medium is located to perform an acquisition method of the identifier.
  • the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, wherein the target data source included in the plurality of data sources records the predetermined operation performed by the account and the account corresponding to the identifier;
  • the identifier information and the preset feature word obtain an initial identifier from the identifier, wherein the feature information is used to represent a feature of the predetermined operation; and the feature parameter of the initial identifier is determined according to the preset weight and the feature information, wherein the preset weight and the target data are Correspondingly, the preset weight is used to indicate the frequency at which the account in the target data source performs the predetermined operation, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation, and the first target identifier is obtained from the initial identifier, where the first target identifier It is a set of identifiers in which the feature parameter is higher than the preset parameter in the initial identifier.
  • the account corresponding to the identifier and the predetermined operation performed by the account are recorded, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the logo size is avoided from a single user log.
  • the initial identifier is initially filtered according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to indicate the initial identifier execution.
  • the frequency of the predetermined operation is then obtained from the initial identifier, the first target flag whose feature parameter is higher than the preset parameter, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving the acquisition.
  • the accuracy of the identification of the training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
  • FIG. 1 is a schematic diagram of an acquisition method of an identifier according to the related art
  • FIG. 2 is a schematic diagram of an application environment of an optional method for acquiring an identifier according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an optional method for acquiring an identifier according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram 1 of an optional identification acquiring device according to an embodiment of the present invention.
  • FIG. 5 is a second schematic diagram of an apparatus for acquiring an identifier according to an embodiment of the present invention.
  • FIG. 6 is a third schematic diagram of an optional identification acquiring device according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram 4 of an apparatus for acquiring an optional identifier according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram 5 of an optional identifier acquiring apparatus according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram 6 of an optional identification acquiring device according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an application scenario of an optional method for acquiring an identifier according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
  • an embodiment of a method for acquiring the foregoing identifier is provided.
  • the method for obtaining the identifier may be, but is not limited to, being applied to an application environment as shown in FIG. 2, and the server 202 is configured to obtain an identifier corresponding to the predetermined operation from the plurality of data sources.
  • the target data source included in the source records an operation performed by the account and the account corresponding to the identifier; the feature information is used to indicate the feature of the predetermined operation; the preset weight corresponds to the target data source, and the preset weight is used to indicate the target data source.
  • the frequency of the predetermined operation performed by the account number, the characteristic parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation; and the first target identifier is a set of the identifiers in the initial identifier whose feature parameters are higher than the preset parameters.
  • the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the server 202 obtains the identifier corresponding to the predetermined operation, so that the acquisition path of the identifier is more extensive, and the single user log is avoided.
  • the server 202 is configured to: acquire the first feature word and the second feature word, where the preset feature word includes the first feature word and the second feature word; and obtain the initial identifier from the identifier
  • the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
  • the server 202 is configured to: obtain a preset weight, wherein a larger value of the preset weight indicates that the frequency of the account in the target data source performs a predetermined operation is higher; and the feature information is obtained from the feature information.
  • the time information and the frequency information wherein the time information is used to indicate the time when the performing the predetermined operation is performed, the frequency information is used to indicate the frequency of the identification performing the predetermined operation, and the characteristic parameter is determined according to the preset weight, the time information and the frequency information, wherein the characteristic parameter A larger value indicates that the initial identification performs a predetermined operation more frequently.
  • the server 202 is configured to: acquire a proportion of an account that performs a predetermined operation in the target data source, and allocate a preset to the target data source according to the ratio; a weight, wherein the larger the proportion, the greater the preset weight of the data source allocation; or the number of the same identifier in the first identifier set and the preset identifier set, wherein the first identifier set is a target data in the initial identifier A set of identifiers included in the source; a preset weight is assigned to the target data source according to a ratio between the quantity and the number of identifiers in the first identifier set, wherein the larger the ratio, the greater the preset weight of the data source allocation.
  • the server 202 is configured to: calculate a product of the corresponding time information and frequency information of the initial identifier in each target data source; calculate a weighted sum of the products according to the preset weight to obtain a feature parameter.
  • the server 202 is configured to: acquire information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identifier, wherein the information for indicating the feature of the predetermined operation includes: the predetermined operation corresponding to Characteristic words, time information and frequency information; storing feature words, time information and frequency information into a preset format to obtain feature information.
  • the server 202 is configured to: arrange the initial identifiers according to the feature parameters from high to low; select the first target identifier from the aligned identifiers, where the first target identifier is included in The identifier of the first N bits in the aligned identifiers; or the first target identifier whose value of the feature parameter is greater than or equal to the preset value is obtained from the initial identifier.
  • the server 202 is configured to: match the first target identifier with the preset target identifier; and determine the first target if the first target identifier and the preset target identifier match successfully.
  • the identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
  • the server 202 is further configured to: determine whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number; and determine the first target identifier and the preset If the target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
  • the server 202 is further configured to: obtain an identifier corresponding to the account that is included in the multiple data sources; and randomly obtain the first target identifier from the identifier corresponding to the account that is included in the multiple data sources.
  • the identifier other than the identifier is obtained, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
  • the client may further include a client connected to the server 202 through a network, where the server 202 is further configured to: train the prediction model according to the first target identifier and the second target identifier; According to the prediction model, the to-be-pushed identifier is obtained for the to-be-pushed resource from the identifiers of the plurality of data sources, and the to-be-pushed resource is pushed to the client used by the account corresponding to the to-be-advertised identifier.
  • the foregoing client may include, but is not limited to, at least one of the following: a mobile phone, a tablet computer, a notebook computer, a desktop PC, a digital television, and other hardware devices for area sharing.
  • the above network may include, but is not limited to, at least one of the following: a wide area network, a metropolitan area network, and a local area network. The above is only an example, and the embodiment does not limit this.
  • a method for obtaining an identifier includes:
  • the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, where the account and the account corresponding to the identifier are recorded in the target data source included in the plurality of data sources;
  • S306. Determine a feature parameter of the initial identifier according to the preset weight and the feature information, where the preset weight is corresponding to the target data source, and the preset weight is used to indicate a frequency at which the account in the target data source performs a predetermined operation, and the feature parameter is used to indicate Initially identifying the frequency at which the predetermined operation is performed;
  • the method for acquiring the identifier may be, but is not limited to, being applied to the method of acquiring the identifier sample for model training, and using the training result to push the resource for the client.
  • the above client may be, but not limited to, various types of software, such as search software, social software, instant messaging software, news information software, game software, shopping software, and the like.
  • it may be, but is not limited to, being applied to a scenario in which the foregoing identification sample is used for model training, and the training result is used by the client of the shopping software to push resources, or may be, but not limited to, applied to the model training in the above-mentioned acquisition identification sample.
  • the training result is used to push the resource of the client of the search software to realize the acquisition of the identification sample.
  • the above is only an example, and is not limited in this embodiment.
  • the multiple data sources may be various platforms, software, websites, applications, and the like.
  • social applications For example: social applications, search engines, e-commerce websites, advertising platforms, etc.
  • the identifiers may correspond to different account accounts in different data sources.
  • a user may have registered an account on multiple applications, for example, an account A is registered on the social platform, an account B is registered on the shopping website, and an account C is registered on the instant messaging application, and the user can If the three accounts on the above platform are associated, the three accounts A, B, and C can be used to uniquely identify the user.
  • one or more data sources may be included in the target data source. That is to say, the account in the data source corresponding to the identifier is recorded in the data source, and the operation performed by the account.
  • the identifier corresponding to the predetermined operation may be recorded in one of the plurality of data sources, or may be recorded in several of the plurality of data sources.
  • the predetermined operation may be to identify a certain behavior performed or a phrase for characterizing the behavior. For example, if the user to be excavated is a user who purchases a maternal and child product, the predetermined operation may be "click on the entry with milk powder or diaper", or "milk powder", "diaper” and the like.
  • the identifier corresponding to the predetermined operation obtained from the plurality of data sources may first obtain an account searched for “milk powder” and “diaper” in the search engine, and the account of the purchased milk powder or diaper in the shopping website is sent in the instant messaging software.
  • the account number of the message "milk powder", “diaper” and the like, and the account number of the item with the powdered milk or the diaper are clicked in the multiple data sources, and the corresponding identifiers of the above-mentioned accounts are obtained.
  • the initial identifier may include, but is not limited to, including one or more identifiers.
  • the preset feature words may be, but are not limited to, one or more feature words.
  • the first target identifier may include, but is not limited to, including one or more identifiers.
  • the preset weight may be used to indicate a frequency at which an account in the target data source performs a predetermined operation.
  • the preset weight can be used to indicate the degree of attention of the account in the target data source to the predetermined operation, which can be represented by, but not limited to, the frequency with which the account in the target data source performs the predetermined operation.
  • the frequency at which the account in the target data source performs the predetermined operation may be, but is not limited to, the number of accounts in the target data source that are frequently performed by the predetermined operation (for example, the account whose frequency exceeds 5 times per day performs the predetermined operation. 50% of the total number of accounts in the data source).
  • performing the saliency of the predetermined operation with the account number in the target data source to indicate the frequency at which the account in the target data source performs the predetermined operation.
  • the significance of the account performing the predetermined operation in the target data source can be determined by calculating the proportion of the identifier in the initial identifier in which the account number is recorded in the target data source (for example, the identifier of the last push resource).
  • the preset weight may be set according to a frequency at which the predetermined operation is performed by the account in the target data source, or may be performed according to the frequency of performing the predetermined operation according to the account in the target data source.
  • the model is trained in the way it is calculated.
  • the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the identifier corresponding to the predetermined operation is obtained from the target data source, so that the obtaining path of the identifier is more extensive, and the identifier is obtained from a single user log.
  • the acquired identifier has a biased problem, and the initial identifier is preliminarily selected according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to represent the initial identifier.
  • the frequency of the predetermined operation is performed, and then the first target flag whose feature parameter is higher than the preset parameter is obtained from the initial identifier, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving acquisition.
  • the accuracy of the identification for training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
  • obtaining the initial identifier from the identifier according to the identifier information and the preset feature word includes:
  • S1 acquiring a first feature word and a second feature word, where the preset feature word includes a first feature word and a second feature word;
  • the initial identifier is obtained from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
  • the preset feature words may include, but are not limited to, a first feature word and a second feature word.
  • the preset feature words can be used to represent characteristics of a type of user population, which can include positive representation words and negative representation words, wherein the positive representation words (equivalent to the first characteristic words described above), that is, the key words in the popular sense Words, used to represent feature populations, negative representation words (equivalent to the above second feature words), ie filter words (filter_words), the role of negative representation words, in denoising, that is, to remove some multi-word stitching The latter noise makes the positive representation words more representative of the characteristic population.
  • the initial identification of the identifier is achieved by obtaining the initial identifier from the identifier according to the identified feature information and the first feature word and the second feature word included in the preset feature word.
  • determining the feature parameters of the initial identifier according to the preset weight and the feature information includes:
  • S1 Obtain a preset weight, wherein a larger value of the preset weight indicates that the frequency of performing the predetermined operation on the account in the target data source is higher;
  • S2 Obtain time information and frequency information from the feature information, where the time information is used to indicate a time when the performing a predetermined operation is performed, and the frequency information is used to indicate a frequency indicating that the predetermined operation is performed;
  • the feature parameter is determined according to the preset weight, the time information, and the frequency information. The greater the value of the feature parameter, the higher the frequency at which the initial identifier performs the predetermined operation.
  • the preset weight may be obtained by one of the following methods:
  • the proportion of the account that performs the predetermined operation in the target data source in all the accounts included in the target data source is obtained; the preset weight is assigned to the target data source according to the ratio, wherein the larger the proportion of the data source is allocated Set the weight more.
  • target data sources For example, there are three target data sources: target data source A, target data source B, and target data source C. There are 100 accounts in the target data source A, and 34 of them have performed predetermined operations on the target data source. There are 200 accounts in B, of which 25 have performed predetermined operations, and there are 100 accounts in the target data source C, of which 56 have performed predetermined operations. Then, the ratios of the target data source A, the target data source B, and the target data source C are 34%, 12.5%, and 56%, respectively, and the target data source A, the target data source B, and the target data are obtained according to the obtained ratio. Source C assigns preset weights 2, 1, and 3.
  • Manner 2 Obtain a quantity of the same identifier in the first identifier set and the preset identifier set, where the first identifier set is a set of identifiers included in a target data source in the initial identifier; and the identifier in the first identifier set according to the quantity
  • the ratio between the number of inputs is a preset weight assigned to the target data source, wherein the larger the ratio, the greater the default weight assigned by the data source.
  • the preset identifier set may be, but is not limited to, an identifier included in the target data source in the first target identifier acquired in the previous time, or is included in the target data source according to the identifier of the previous push data. logo.
  • the identifier of the target data source in the first target identifier acquired by the preset identifier set is taken as an example, and the preset identifier set A corresponding to the target data source A includes 40 identifiers, and the target data is included.
  • the preset identifier set B corresponding to the source B includes 30 identifiers, and the preset identifier set C corresponding to the target data source C includes 40 identifiers; the initial identifier includes the target data source A, the target data source B, and the target data.
  • the number of the identifiers of the source C is 20, 40, and 40 respectively.
  • the first identifier set A corresponding to the target data source A includes 20 identifiers
  • the first identifier set B corresponding to the target data source B includes 40 identifiers
  • the first identifier set C corresponding to the target data source C includes 40 identifiers, wherein the first identifier set A is matched with the identifier in the preset identifier set A, and the first identifier set A and the preset identifier set A are obtained.
  • the number of the same identifier is 10, and the first identifier set B is matched with the identifier in the preset identifier set B, and the number of the same identifier in the first identifier set B and the preset identifier set B is 5, which will be the first Standard
  • the set C is matched with the identifier in the preset identifier set C, and the number of the same identifiers in the first identifier set C and the preset identifier set C is 20, and the target data source A is obtained according to the obtained number of the same identifiers.
  • the target data source B and the target data source C are respectively assigned preset weights 2, 1, and 3.
  • the feature parameter may be determined by: calculating a product of the corresponding time information and frequency information of the initial identifier in each target data source, and then calculating a weighted sum of the products according to the preset weight, to obtain Characteristic Parameters.
  • source represents the data source, there are n data sources; weight represents the preset weight on each data source; time represents the above time information, you can use abs (user behavior occurrence time - current mining time ), that is, the absolute value of the behavior time difference to represent the above time information, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the feature parameter is, the farther from the current time, the smaller the feature parameter; Representing the above frequency information, it can be used to indicate the frequency of user behavior.
  • the sigmoid function is taken and normalized; the more the behavior frequency is, the higher the feature parameter is.
  • the feature identifier of the initial identifier is determined according to the preset weight and the feature information, and the initial identifier is scored, which can be used to measure the frequency at which the initial identifier performs a predetermined operation, so that the first target identifier selected from the initial identifier is further It can represent a predetermined operation, thereby improving the accuracy of obtaining the identification for training, thereby overcoming the problem of low accuracy in obtaining the identification for training in the related art.
  • the method before the initial identifier is obtained from the identifier according to the identifier information and the preset feature word, the method further includes:
  • S2 storing feature words, time information, and frequency information into a preset format to obtain feature information.
  • the information for indicating the feature of the predetermined operation acquired from the predetermined operation corresponding to the identification is organized into a predetermined format for storage, so that the comparison of the feature words is faster and more convenient.
  • obtaining the first target identifier from the initial identifier includes one of the following:
  • the initial identifiers are arranged according to the feature parameters from high to low; the first target identifier is selected from the aligned identifiers, wherein the first target identifier includes the identifiers ranked in the first N digits in the aligned identifiers;
  • the feature parameters may be sorted from high to low, and the identifiers ranked in the first N bits are used as identifiers whose feature parameters are higher than the preset parameters, to obtain the first target identifier.
  • the preset value may be set, and the identifier corresponding to the feature parameter whose value is greater than or equal to the preset value is used as the first target identifier.
  • obtaining the first target identifier by sorting the feature parameters from high to low, or setting the preset value can clearly select an identifier that is more representative of the predetermined operation from the initial identifier.
  • the method further includes:
  • first target identifier and the preset target identifier are successfully matched, determine that the first target identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, re-acquire the first A target identifier.
  • the first target identifier is matched with the preset target identifier by determining whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number. If it is determined that the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
  • the preset target identifier may be the first target identifier acquired last time, and may also be a preset target identifier.
  • the first target identifier when the first target identifier is re-acquired, the first target identifier may be obtained by, but not limited to, re-acquiring the identifier corresponding to the predetermined operation by resetting the predetermined operation. It is also possible, but not limited to, to reacquire the first target identity by re-assigning the target weight to the target data source.
  • the first target identifier is matched with the preset target identifier. If the matching succeeds, it may be determined that the currently acquired first target identifier meets the needs of the model training, that is, the first target identifier is required. logo. On the other hand, if the matching is unsuccessful, it indicates that the currently acquired first target identifier does not meet the needs of the model training, and the first target identifier may be reacquired.
  • the method further includes:
  • the first target identifier may be used as a positive sample of the model training, and after acquiring the first target identifier, the second target identifier may be obtained as a model from all identifiers of the multiple data sources. Negative sample of training.
  • the identifiers other than the first target identifier are randomly obtained from the identifiers corresponding to the account numbers included in the multiple data sources, and after the second target identifier is obtained, the method further includes:
  • the acquired first target identifier and the second target identifier may be used to perform training of the predictive model, so that the to-be-pushed identifier obtained by the predictive model can more accurately represent the pointed operation. crowd. Thereby, the efficiency of pushing resources can be made higher.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention in essence or the contribution to the related art can be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, CD-ROM).
  • the instructions include a number of instructions for causing a terminal device (which may be a cell phone, computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.
  • an apparatus for acquiring an identifier of an acquisition method of the foregoing identifier As shown in FIG. 4, the apparatus includes:
  • the first obtaining module 42 is configured to obtain an identifier corresponding to the predetermined operation from the plurality of data sources, wherein the account and the account corresponding to the identifier are recorded in the target data source included in the plurality of data sources operating;
  • the second obtaining module 44 is configured to obtain an initial identifier from the identifier according to the identified feature information and the preset feature word, where the feature information is used to indicate a feature of the predetermined operation;
  • the determining module 46 is configured to determine the feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate that the account in the target data source performs the predetermined operation. Frequency, the characteristic parameter is used to indicate the frequency at which the initial identification performs a predetermined operation;
  • the third obtaining module 48 is configured to obtain the first target identifier from the initial identifier, where the first target identifier is a set of identifiers in the initial identifier whose feature parameters are higher than the preset parameters.
  • the acquiring device of the foregoing identifier may be, but is not limited to, being applied to the method of acquiring the identifier sample for model training, and using the training result to push the resource for the client.
  • the above client may be, but not limited to, various types of software, such as search software, social software, instant messaging software, news information software, game software, shopping software, and the like.
  • it may be, but is not limited to, being applied to a scenario in which the foregoing identification sample is used for model training, and the training result is used by the client of the shopping software to push resources, or may be, but not limited to, applied to the model training in the above-mentioned acquisition identification sample.
  • the training result is used to push the resource of the client of the search software to realize the acquisition of the identification sample.
  • the above is only an example, and is not limited in this embodiment.
  • the multiple data sources may be various platforms, software, websites, applications, and the like.
  • social applications For example: social applications, search engines, e-commerce websites, advertising platforms, etc.
  • the identifiers may correspond to different account accounts in different data sources.
  • a user may have registered an account on multiple applications, for example, an account A is registered on the social platform, an account B is registered on the shopping website, and an account C is registered on the instant messaging application, and the user can If the three accounts on the above platform are associated, the three accounts A, B, and C can be used to uniquely identify the user.
  • one or more data sources may be included in the target data source. That is to say, the account in the data source corresponding to the identifier is recorded in the data source, and the operation performed by the account.
  • the identifier corresponding to the predetermined operation may be recorded in one of the plurality of data sources, and may also be recorded in several of the plurality of data sources.
  • the predetermined operation may be to identify a certain behavior performed or a phrase for characterizing the behavior. For example, if the user to be excavated is a user who purchases a maternal and child product, the predetermined operation may be "click on the entry with milk powder or diaper", or "milk powder", "diaper” and the like.
  • the identifier corresponding to the predetermined operation obtained from the plurality of data sources may first obtain an account searched for “milk powder” and “diaper” in the search engine, and the account of the purchased milk powder or diaper in the shopping website is sent in the instant messaging software.
  • the account number of the message "milk powder", “diaper” and the like, and the account number of the item with the powdered milk or the diaper are clicked in the multiple data sources, and the corresponding identifiers of the above-mentioned accounts are obtained.
  • the initial identifier may include, but is not limited to, including one or more identifiers.
  • the preset feature words may be, but are not limited to, one or more feature words.
  • the first target identifier may include, but is not limited to, including one or more identifiers.
  • the preset weight may be used to indicate a frequency at which an account in the target data source performs a predetermined operation.
  • the preset weight can be used to indicate the degree of attention of the account in the target data source to the predetermined operation, which can be expressed, but not limited to, by the frequency with which the account in the target data source performs the predetermined operation.
  • the frequency at which the account in the target data source performs the predetermined operation may be, but is not limited to, the number of accounts in the target data source that are frequently performed by the predetermined operation (for example, the account whose frequency exceeds 5 times per day performs the predetermined operation. 50% of the total number of accounts in the data source).
  • performing the saliency of the predetermined operation with the account number in the target data source to indicate the frequency at which the account in the target data source performs the predetermined operation.
  • the significance of the account performing the predetermined operation in the target data source can be determined by calculating the proportion of the identifier in the initial identifier in which the account number is recorded in the target data source (for example, the identifier of the last push resource).
  • the preset weight may be set according to a frequency at which the predetermined operation is performed by the account in the target data source, or may be performed according to the frequency of performing the predetermined operation according to the account in the target data source.
  • the model is trained in the way it is calculated.
  • the account corresponding to the identifier and the operation performed by the account are recorded in the target data source, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the identifier is obtained from a single user log.
  • the acquired identifier has a biased problem, and the initial identifier is preliminarily selected according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to represent the initial identifier.
  • the frequency of the predetermined operation is performed, and then the first target flag whose feature parameter is higher than the preset parameter is obtained from the initial identifier, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving acquisition.
  • the accuracy of the identification for training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
  • the second obtaining module 44 includes:
  • the first obtaining unit 52 is configured to acquire the first feature word and the second feature word, wherein the preset feature word includes the first feature word and the second feature word;
  • the second obtaining unit 54 is configured to obtain an initial identifier from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
  • the preset feature words may include, but are not limited to, a first feature word and a second feature word.
  • the preset feature words can be used to represent characteristics of a type of user population, which can include positive representation words and negative representation words, wherein the positive representation words (equivalent to the first characteristic words described above), that is, the key words in the popular sense Words, used to represent feature populations, negative representation words (equivalent to the above second feature words), ie filter words (filter_words), the role of negative representation words, in denoising, that is, to remove some multi-word stitching The latter noise makes the positive representation words more representative of the characteristic population.
  • the initial identifier is obtained from the identifier according to the feature information of the identifier and the first feature word and the second feature word included in the preset feature word, thereby implementing preliminary screening of the identifier.
  • the determining module 46 includes:
  • the third obtaining unit 62 is configured to acquire a preset weight, wherein a larger value of the preset weight indicates that the frequency of performing the predetermined operation by the account in the target data source is higher;
  • the fourth obtaining unit 64 is configured to obtain the time information and the frequency information from the feature information, wherein the time information is used to indicate the time when the performing the predetermined operation is performed, and the frequency information is used to indicate the frequency of the identification performing the predetermined operation;
  • the determining unit 66 is configured to determine the feature parameter according to the preset weight, the time information, and the frequency information, wherein the larger the value of the feature parameter is, the higher the frequency at which the initial identifier performs the predetermined operation.
  • the third obtaining unit 62 is set to one of the following:
  • the first identifier set Obtaining, by the first identifier set, a quantity of the same identifier in the preset identifier set, where the first identifier set is a set of identifiers included in a target data source in the initial identifier; and according to the quantity and the quantity identified in the first identifier set
  • the ratio between the two is assigned a preset weight, and the larger the ratio, the larger the default weight assigned by the data source.
  • target data sources For example, there are three target data sources: target data source A, target data source B, and target data source C. There are 100 accounts in the target data source A, and 34 of them have performed predetermined operations on the target data source. There are 200 accounts in B, of which 25 have performed predetermined operations, and there are 100 accounts in the target data source C, of which 56 have performed predetermined operations. Then, the ratios of the target data source A, the target data source B, and the target data source C are 34%, 12.5%, and 56%, respectively, and the target data source A, the target data source B, and the target data are obtained according to the obtained ratio. Source C assigns preset weights 2, 1, and 3.
  • the preset identifier set may be, but is not limited to, an identifier included in the target data source in the first target identifier acquired in the previous time, or is included in the target data source according to the identifier of the previous push data. logo.
  • the identifier of the target data source in the first target identifier acquired by the preset identifier set is taken as an example, and the preset identifier set A corresponding to the target data source A includes 40 identifiers, and the target data is included.
  • the preset identifier set B corresponding to the source B includes 30 identifiers, and the preset identifier set C corresponding to the target data source C includes 40 identifiers; the initial identifier includes the target data source A, the target data source B, and the target data.
  • the number of the identifiers of the source C is 20, 40, and 40 respectively.
  • the first identifier set A corresponding to the target data source A includes 20 identifiers
  • the first identifier set B corresponding to the target data source B includes 40 identifiers
  • the first identifier set C corresponding to the target data source C includes 40 identifiers, wherein the first identifier set A is matched with the identifier in the preset identifier set A, and the first identifier set A and the preset identifier set A are obtained.
  • the number of the same identifier is 10, and the first identifier set B is matched with the identifier in the preset identifier set B, and the number of the same identifier in the first identifier set B and the preset identifier set B is 5, which will be the first Standard
  • the set C is matched with the identifier in the preset identifier set C, and the number of the same identifiers in the first identifier set C and the preset identifier set C is 20, and the target data source A is obtained according to the obtained number of the same identifiers.
  • the target data source B and the target data source C are respectively assigned preset weights 2, 1, and 3.
  • the fourth obtaining unit 64 is configured to: calculate a product of the initial information and the frequency information corresponding to the initial identifier in each target data source; calculate a weighted sum of the products according to the preset weight, to obtain Characteristic Parameters.
  • source represents the data source, there are n data sources; weight represents the preset weight on each data source; time represents the above time information, you can use abs (user behavior occurrence time - current mining time ), that is, the absolute value of the behavior time difference to represent the above time information, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the feature parameter is, the farther from the current time, the smaller the feature parameter; Representing the above frequency information, it can be used to indicate the frequency of user behavior.
  • the sigmoid function is taken and normalized; the more the behavior frequency is, the higher the feature parameter is.
  • the feature identifier of the initial identifier is determined according to the preset weight and the feature information, and the initial identifier is scored, which can be used to measure the frequency at which the initial identifier performs the predetermined operation, so that the first target identifier selected from the initial identifier is further It can represent a predetermined operation, thereby improving the accuracy of obtaining the identification for training, thereby overcoming the problem of low accuracy in obtaining the identification for training in the related art.
  • the apparatus further includes:
  • a sixth obtaining module configured to acquire information for indicating a feature of the predetermined operation from a predetermined operation corresponding to the identifier, wherein the information for indicating the feature of the predetermined operation comprises: a feature word corresponding to the predetermined operation, time information, and frequency information;
  • the storage module is configured to store the feature words, the time information, and the frequency information into a preset format to obtain the feature information.
  • the information for indicating the feature of the predetermined operation acquired from the predetermined operation corresponding to the identification is sorted into a predetermined format for storage, thereby making the comparison of the feature words more convenient and convenient.
  • the third obtaining module 48 includes one of the following:
  • the processing unit 72 is configured to arrange the initial identifiers according to the feature parameters from high to low; select the first target identifier from the arranged identifiers, wherein the first target identifier is included in the aligned identifiers.
  • the fifth obtaining unit 74 is configured to acquire, from the initial identifier, a first target identifier whose value of the feature parameter is greater than or equal to a preset value.
  • the feature parameters may be sorted from high to low, and the identifiers ranked in the first N bits are used as identifiers whose feature parameters are higher than the preset parameters, to obtain the first target identifier.
  • the preset value may be set, and the identifier corresponding to the feature parameter whose value is greater than or equal to the preset value is used as the first target identifier.
  • the foregoing apparatus further includes:
  • the matching module 82 is configured to match the first target identifier with the preset target identifier
  • the processing module 84 is configured to determine that the first target identifier is a required identifier if the first target identifier and the preset target identifier are successfully matched; and the first target identifier and the preset target identifier are not successfully matched. In the case of re-acquiring the first target identifier.
  • the matching module 82 is configured to: determine whether the first target identifier and the preset target identifier include the same identifier that is greater than or equal to the preset number; and determine the first target identifier and the preset If the target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
  • the preset target identifier may be the first target identifier acquired last time, and may also be a preset target identifier.
  • the first target identifier when the first target identifier is re-acquired, the first target identifier may be obtained by, but not limited to, re-acquiring the identifier corresponding to the predetermined operation by resetting the predetermined operation. It is also possible, but not limited to, to reacquire the first target identity by re-assigning the target weight to the target data source.
  • the first target identifier is matched with the preset target identifier. If the matching succeeds, it may be determined that the currently acquired first target identifier meets the needs of the model training, that is, the first target identifier is required. logo. On the other hand, if the matching is unsuccessful, it indicates that the currently acquired first target identifier does not meet the needs of the model training, and the first target identifier may be reacquired.
  • the foregoing apparatus further includes:
  • the fourth obtaining module 92 is configured to acquire an identifier corresponding to the account included in the plurality of data sources;
  • the fifth obtaining module 94 is configured to randomly obtain an identifier other than the first target identifier from the identifier corresponding to the account number included in the plurality of data sources, to obtain a second target identifier, where the second target identifier includes The number of identifiers is the same as the number of identifiers included in the first target identifier.
  • the first target identifier may be used as a positive sample of the model training, and after acquiring the first target identifier, the second target identifier may be obtained as a model from all identifiers of the multiple data sources. Negative sample of training.
  • the foregoing apparatus further includes:
  • a training module configured to train the prediction model according to the first target identifier and the second target identifier
  • the seventh obtaining module is configured to obtain, to be pushed, a to-be-pushed identifier for the to-be-pushed resource from the identifiers included in the plurality of data sources according to the prediction model;
  • the push module is configured to push the to-be-pushed resource to the to-be-pushed identifier.
  • the acquired first target identifier and the second target identifier may be used to perform training of the predictive model, so that the to-be-pushed identifier obtained by the predictive model can more accurately represent the pointed operation. crowd. Thereby, the efficiency of pushing resources can be made higher.
  • the application environment of the embodiment of the present invention may be, but is not limited to, the application environment in the first embodiment, which is not described in this embodiment.
  • An embodiment of the present invention provides an optional specific application example for implementing the foregoing method for obtaining an identifier.
  • the method for obtaining the identifier may be, but is not limited to, applied to a scenario for acquiring an identifier as shown in FIG. 10 .
  • the plurality of data sources provide data for the server, and the server obtains the first target identifier and the second target identifier according to the data obtained from the data source, and then performs training on the predictive model according to the first target identifier and the second target identifier, and is trained.
  • the prediction model selects the identifier of the to-be-pushed resource from all the identifiers, and pushes the to-be-pushed resource to the filtered login client.
  • multiple data sources may include social/search/e-commerce/advertising/mobile application (application, referred to as app) and the like to use the identified user in social/search/e-commerce/
  • app application, referred to as app
  • the user behavior in the field of advertising/mobile app is used as the characteristic information of the logo, and the primary selected crowd in each vertical industry is mined through the text semantics; the historical effect in the target data source is verified by the matching of the first identifier set and the same identifier in the preset identifier set.
  • the saliency is given a preset weight and is sorted according to preset weights and frequency information (for example: user behavior frequency) and time information (for example: time decay factor) for the primary selection; by selecting the top N positions
  • the identifier obtains the first target identifier, and the cross-validation of the historical effect is performed by matching the first target identifier with the preset target identifier, and the positive sample of the training data can be effectively selected; and the selected positive is subtracted from the active population of the large disc
  • the sample set randomly acquires a second target identifier of the same size from the remaining set as a negative sample set. Thereby, the server obtains the first target identifier and the second target identifier.
  • the positive and negative samples of the training data are obtained through text semantic feature mining, and the user's various user behavior characteristics in the social/search/e-commerce/advertising/mobile app domain are integrated, and then the user behavior frequency factor is adopted (ie, The above frequency information) and the behavior time decay factor (ie, the above time information), and the historical effect verification of the user on different behaviors, giving the user different behavior weight factors (ie, the above-mentioned preset weights), synthesizing the above elements, and making the user
  • the scores that is, the feature parameters obtained above
  • the scores can be sorted according to the scores, and the purity of the positive samples (ie, the first target identifier) can be effectively determined, and the markers ranked in the first N positions can be freely selected as training according to needs.
  • Positive sample of data Thereby solving the problem that the user behavior is single and the purity of the positive sample is low.
  • the behavior characteristics of the user in various scenarios on the Internet can be integrated, and the identifier corresponding to the user population with specific specific representation meanings can be mined, and the positive and negative samples with higher purity can be obtained through verification detection.
  • the foregoing server in this embodiment may include the following functional modules:
  • a feature representation word collection module configured to define a feature representation word (corresponding to the preset feature word) according to a feature of the identifier corresponding to the specific population that needs to be filtered, which includes a positive representation word (corresponding to the first feature described above) Word) and negative representation words (equivalent to the above second feature words), wherein the positive representation words, that is, the keywords in the popular sense, the negative representation words, ie the filter words (filter_words), the negative representation words
  • the function is to denoise, that is, to remove some of the multi-word spliced noise, so that the positive representation words can better represent our characteristic population.
  • the user multiple behavior feature fusion module is set to be refined by the user in various behaviors in the fields of social/search/e-commerce/advertising/mobile app (user identification-character representation string-time information-frequency information) ) These key elements.
  • the pattern matching module is configured to: according to the feature representation words in the feature representation word collection module, the user multiple behavior data in the user multiple behavior feature fusion module (user identification-feature representation string-time information-frequency information) In the pattern matching method, the user identifier containing the positive representation word but not the negative representation word is searched for as the primary selection identifier.
  • the user scoring module is set to score the primary selection identifier in the pattern matching module (ie, acquire the feature parameter), and the scoring involves two parts, one part is to calculate the preset weight of the data source, and the part is fine.
  • calculate the behavior score of each primary identifier; where weight is calculated there are two ways. First, the data source is divided into population packets, and the first identifier set and the preset identifier set are used. The matching of the same identifier respectively verifies the saliency of the crowd package on a single target data source, and assigns the preset weight of the current data source according to the relative value of the saliency; the other way is through the model training method, such as the LR method.
  • the training obtains the final weight of the final data source. For example, first assign an initial weight value to each data source, and then train each data source as its feature according to the primary selected small-scale positive and negative samples, and finally iterate. After convergence, the model can spit out the preset weights of each data source.
  • each initial identifier is scored according to the following formula:
  • source represents the data source
  • weight represents the preset weight on each data source
  • time is time information, in this example, abs (user behavior occurrence time - current Mining time), that is, the absolute value of the behavior time difference, as the user behavior time decay parameter, that is, the closer the behavior occurs to the current time, the larger the score is, the farther from the current time, the smaller the score
  • the action is
  • the frequency information is used to represent the frequency of the user identification.
  • the sigmoid function is taken and normalized. The more the behavior frequency is, the higher the score is.
  • the positive and negative sample selection module is set to sort the rankings of the first N people according to the ranking of the primary selection group in the user scoring module, and select the identifiers ranked in the first N digits (the value of the N value may be different according to the orientation identifier to be mined, and the characteristic parameters)
  • the first N digits of the identification is a positive sample
  • the positive sample set is excluded from the identification of the active user of the large disk, and the same positive sample is selected from the remaining sets.
  • the size of the population is identified as a negative sample.
  • an electronic device for implementing the method for acquiring the above identifier.
  • the electronic device may include: one or more (only one shown in the figure) processor 201
  • the memory 203, and the transmission device 205, as shown in FIG. 11, may further include an input and output device 207.
  • the memory 203 can be used to store a computer program and a module, such as the method for acquiring the identifier and the program instruction/module corresponding to the device in the embodiment of the present invention.
  • the processor 201 is configured to run the software program and the module stored in the memory 203. , thereby performing various functional applications and data processing, that is, implementing the above data loading method.
  • Memory 203 can include high speed random access memory, and can also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 203 can further include memory remotely located relative to processor 201, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the above described transmission device 205 is used to receive or transmit data via a network, and can also be used for data transmission between the processor and the memory. Specific examples of the above network may include a wired network and a wireless network.
  • the transmission device 205 includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • the transmission device 205 is a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • the memory 203 is used to store an application.
  • the processor 201 may call the application stored in the memory 203 through the transmission device 205 to perform the steps of: acquiring an identifier corresponding to the predetermined operation from the plurality of data sources, wherein the target data source included in the plurality of data sources The account corresponding to the identifier and the predetermined operation performed by the account are recorded; the initial identifier is obtained from the identifier according to the feature information of the identifier and the preset feature word, where the feature information is used by Determining a feature of the predetermined operation; determining a feature parameter of the initial identifier according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is used to indicate The frequency of the predetermined operation is performed by the account in the target data source, the feature parameter is used to indicate the frequency at which the initial identifier performs the predetermined operation, and the first target identifier is obtained from the initial identifier, where The first target identifier is a set of identifiers in the initial identifie
  • the processor 201 is further configured to: acquire the first feature word and the second feature word, wherein the preset feature word includes the first feature word and the second feature word; from the identifier Obtaining the initial identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
  • the processor 201 is further configured to: perform the step of: acquiring the preset weight, wherein a greater value of the preset weight indicates that a higher frequency of an account in the target data source performing the predetermined operation is performed; Obtaining time information and frequency information in the feature information, wherein the time information is used to indicate a time when the identifier performs the predetermined operation, and the frequency information is used to indicate the frequency at which the identifier performs the predetermined operation; The preset weight, the time information, and the frequency information determine the feature parameter, wherein a greater value of the feature parameter indicates a higher frequency at which the initial identifier performs the predetermined operation.
  • the processor 201 is further configured to: perform: acquiring a proportion of an account number of the target data source that performs the predetermined operation in all accounts included in the target data source;
  • the target data source allocates the preset weight, wherein the predetermined weight of the data source allocated by the data source is larger; and the number of the same identifier in the first identifier set and the preset identifier set is obtained, where
  • the first identifier set is a set of identifiers included in one of the target data sources in the initial identifier; and a ratio between the quantity and the number of identifiers in the first identifier set is the target data source.
  • the preset weight is allocated, wherein the predetermined weight of the data source allocated by the larger ratio is larger.
  • the processor 201 is further configured to: calculate a product of the initial identifier corresponding to the time information and the frequency information in each of the target data sources; and calculate the product according to the preset weight The weighted sum is obtained to obtain the characteristic parameter.
  • the processor 201 is further configured to: obtain information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identifier, wherein the information for indicating a feature of the predetermined operation And including: the feature word corresponding to the predetermined operation, the time information and the frequency information; storing the feature word, the time information, and the frequency information into a preset format to obtain the feature information.
  • the processor 201 is further configured to perform one of the following steps: arranging the initial identifiers according to the feature parameters from high to low; and selecting the first target identifier from the aligned identifiers, where the A target identifier includes an identifier of the top N bits in the aligned identifiers; and the first target identifier whose value of the feature parameter is greater than or equal to a preset value is obtained from the initial identifier.
  • the processor 201 is further configured to: perform: matching the first target identifier with a preset target identifier; and determining, if the first target identifier and the preset target identifier are successfully matched, The first target identifier is a required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
  • the processor 201 is further configured to: determine whether the first target identifier and the preset target identifier include the same identifier greater than or equal to a preset number; and determine the first target identifier and the If the preset target identifier includes the same identifier that is greater than or equal to the preset number, the first target identifier is determined to be successfully matched with the preset target identifier.
  • the processor 201 is further configured to: obtain an identifier corresponding to an account that is included in the multiple data sources, and randomly obtain, in addition to the first target identifier, an identifier corresponding to an account that is included in the multiple data sources. And the identifier of the second target identifier is obtained, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
  • the processor 201 is further configured to: perform a training prediction model according to the first target identifier and the second target identifier; and obtain, according to the prediction model, an identifier to be pushed from an identifier included by the multiple data sources.
  • the identifier to be pushed is pushed; the to-be-pushed resource is pushed to the to-be-pushed identifier.
  • An embodiment of the present invention provides a solution for obtaining an identifier.
  • the feature word is obtained from the identifier, wherein the feature information is used to represent the feature of the predetermined operation; the feature parameter of the initial identifier is determined according to the preset weight and the feature information, wherein the preset weight corresponds to the target data source, and the preset weight is a frequency used to indicate that an account in the target data source performs a predetermined operation, the feature parameter is used to indicate a frequency at which the initial identifier performs a predetermined operation, and the first target identifier is obtained from the initial identifier, where the first target identifier is a characteristic parameter in the initial identifier A collection of identities that are higher than the preset
  • the account corresponding to the identifier and the predetermined operation performed by the account are recorded, and the identifier corresponding to the predetermined operation is obtained, so that the obtaining path of the identifier is more extensive, and the logo size is avoided from a single user log.
  • the initial identifier is initially filtered according to the feature information of the identifier and the preset feature word, and the feature identifier is determined according to the preset weight and the feature information to indicate the initial identifier execution.
  • the frequency of the predetermined operation is then obtained from the initial identifier, the first target flag whose feature parameter is higher than the preset parameter, so that the identifier included in the first target identifier is an identifier that performs a predetermined operation frequency, thereby improving the acquisition.
  • the accuracy of the identification of the training further overcomes the problem of low accuracy in obtaining the identification for training in the related art.
  • FIG. 11 is merely illustrative, and the electronic device can be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (MID). Terminal equipment such as PAD.
  • FIG. 11 does not limit the structure of the above electronic device.
  • the electronic device may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 11, or have a different configuration than that shown in FIG.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be located in at least one of the plurality of network devices in the network.
  • the storage medium is arranged to store program code for performing the following steps:
  • the identifier corresponding to the predetermined operation is obtained from the plurality of data sources, wherein the target data source included in the plurality of data sources records an account and a predetermined operation performed by the account;
  • S4 Acquire a first target identifier from the initial identifier, where the first target identifier is a set of identifiers in the initial identifier that have a feature parameter higher than a preset parameter.
  • the storage medium is further arranged to store program code for performing the following steps:
  • S1 acquiring a first feature word and a second feature word, where the preset feature word includes a first feature word and a second feature word;
  • the initial identifier is obtained from the identifier, where the feature information corresponding to the initial identifier carries the first feature word and does not carry the second feature word.
  • the storage medium is further configured to store program code for performing the following steps: obtaining a preset weight, wherein a larger value of the preset weight indicates that the account in the target data source has a higher frequency of performing the predetermined operation;
  • the time information is used to obtain the time information and the frequency information, wherein the time information is used to indicate the time when the predetermined operation is performed, the frequency information is used to indicate the frequency at which the identification performs the predetermined operation, and the characteristic parameter is determined according to the preset weight, the time information, and the frequency information.
  • the larger the value of the feature parameter is, the higher the frequency at which the initial identifier performs the predetermined operation.
  • the storage medium is further configured to store program code for performing the following steps: acquiring a proportion of an account in the target data source that performs a predetermined operation in all accounts included in the target data source; and targeting the target data according to the ratio
  • the source allocation preset weight wherein the larger the proportion, the greater the preset weight of the data source allocation; or the number of the same identifier in the first identifier set and the preset identifier set, wherein the first identifier set is the initial identifier a set of identifiers included in a target data source; a preset weight is assigned to the target data source according to a ratio between the quantity and the number identified in the first identifier set, wherein the larger the ratio, the more the preset weight of the data source is assigned Big.
  • the storage medium is further configured to store program code for performing the steps of: calculating a product of the initial identification of the corresponding time information and frequency information in each target data source; calculating a weighted sum of the products according to the preset weight, Get the characteristic parameters.
  • the storage medium is further configured to store program code for: obtaining information for indicating a feature of the predetermined operation from the predetermined operation corresponding to the identification, wherein the information for indicating the feature of the predetermined operation comprises : Feature words, time information and frequency information corresponding to the predetermined operation; storing the feature words, time information and frequency information into a preset format to obtain feature information.
  • the storage medium is further configured to store program code for performing the following steps: arranging the initial identifiers according to the feature parameters from high to low; selecting the first target identifier from the aligned identifiers, wherein the first The target identifier includes an identifier of the top N bits in the aligned identifiers; or, the first target identifier whose value of the feature parameter is greater than or equal to the preset value is obtained from the initial identifier.
  • the storage medium is further configured to store program code for performing the following steps: matching the first target identifier with the preset target identifier; and determining that the first target identifier matches the preset target identifier, determining The first target identifier is the required identifier; if the first target identifier and the preset target identifier are unsuccessful, the first target identifier is re-acquired.
  • the storage medium is further configured to store program code for performing the following steps: determining whether the first target identifier and the preset target identifier include the same identifier greater than or equal to the preset number; and determining the first target identifier In the case that the preset target identifier includes the same identifier that is greater than or equal to the preset number, it is determined that the first target identifier and the preset target identifier match successfully.
  • the storage medium is further configured to store program code for performing the following steps: acquiring an identifier corresponding to the account number included in the plurality of data sources; randomly obtaining the identifier corresponding to the account number included in the plurality of data sources An identifier other than the target identifier obtains a second target identifier, wherein the number of the identifiers included in the second target identifier is the same as the number of the identifiers included in the first target identifier.
  • the storage medium is further configured to store program code for performing the following steps: training the prediction model according to the first target identifier and the second target identifier; and selecting the resource to be pushed from the identifiers included in the plurality of data sources according to the prediction model Acquire the to-be-pushed identifier; push the to-be-pushed resource to the to-be-pushed identifier.
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • the integrated unit in the above embodiment if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in the above-described computer readable storage medium.
  • the technical solution of the present invention may be embodied in the form of a software product in the form of a software product, or the whole or part of the technical solution, which is stored in a storage medium, including
  • the instructions are used to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the disclosed client may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the present invention records the account corresponding to the identifier and the predetermined operation performed by the account in the target data source, and obtains the identifier corresponding to the predetermined operation, so that the acquisition path of the identifier is more extensive, and the single user log is avoided.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé et un appareil permettant d'obtenir un identifiant, ainsi qu'un support de stockage et un dispositif électronique. Le procédé consiste à : obtenir des identifiants correspondant à une opération prédéterminée à partir de multiples sources de données ; obtenir des identifiants initiaux parmi les identifiants en fonction des informations caractéristiques des identifiants et des mots caractéristiques prédéfinis ; déterminer les paramètres caractéristiques des identifiants initiaux en fonction d'un poids prédéfini et des informations caractéristiques ; et obtenir un premier identifiant cible parmi les identifiants initiaux, le premier identifiant cible étant un ensemble d'identifiants dont les paramètres caractéristiques sont supérieurs à un paramètre prédéfini dans les identifiants initiaux. La solution technique résout le problème technique dans l'état de la technique lié à la faible précision d'obtention d'un identifiant pour l'apprentissage.
PCT/CN2018/081337 2017-04-27 2018-03-30 Procédé et appareil d'obtention d'identifiant, support de stockage et dispositif électronique Ceased WO2018196553A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710290180.5A CN108304426B (zh) 2017-04-27 2017-04-27 标识的获取方法及装置
CN201710290180.5 2017-04-27

Publications (1)

Publication Number Publication Date
WO2018196553A1 true WO2018196553A1 (fr) 2018-11-01

Family

ID=62872225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/081337 Ceased WO2018196553A1 (fr) 2017-04-27 2018-03-30 Procédé et appareil d'obtention d'identifiant, support de stockage et dispositif électronique

Country Status (2)

Country Link
CN (1) CN108304426B (fr)
WO (1) WO2018196553A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472879A (zh) * 2019-08-20 2019-11-19 秒针信息技术有限公司 一种资源效果的评估方法、装置、电子设备及存储介质
CN110991296A (zh) * 2019-11-26 2020-04-10 腾讯科技(深圳)有限公司 视频标注方法、装置、电子设备及计算机可读存储介质
CN111651657A (zh) * 2020-06-04 2020-09-11 深圳前海微众银行股份有限公司 情报监控方法、装置、设备及计算机可读存储介质
CN112187746A (zh) * 2020-09-15 2021-01-05 北京明略昭辉科技有限公司 一种设备标识的生成方法及装置
CN113780744A (zh) * 2021-08-13 2021-12-10 唯品会(广州)软件有限公司 货物组合方法、装置及电子设备
CN114461699A (zh) * 2022-01-28 2022-05-10 嘉兴职业技术学院 一种基于跨境电商平台的大数据用户挖掘方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636433A (zh) * 2018-10-16 2019-04-16 深圳壹账通智能科技有限公司 基于大数据分析的养卡识别方法、装置、设备和存储介质
CN111967915B (zh) * 2020-08-27 2024-11-26 北京明略昭辉科技有限公司 媒体文件投放方法和装置、存储介质及电子装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819804A (zh) * 2011-06-07 2012-12-12 阿里巴巴集团控股有限公司 一种商品信息的推送方法及设备
CN102831234A (zh) * 2012-08-31 2012-12-19 北京邮电大学 基于新闻内容和主题特征的个性化新闻推荐装置和方法
CN104317865A (zh) * 2014-10-16 2015-01-28 南京邮电大学 一种基于音乐情感特征匹配的社交网络搜索交友方法
CN105430504A (zh) * 2015-11-27 2016-03-23 中国科学院深圳先进技术研究院 基于电视观看日志挖掘的家庭成员结构识别方法与系统
CN106126592A (zh) * 2016-06-20 2016-11-16 北京小米移动软件有限公司 搜索数据的处理方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120052683A (ko) * 2010-11-16 2012-05-24 한국전자통신연구원 지능형 서비스를 위한 다자간 상황정보 공유 장치 및 방법
CN103593368A (zh) * 2012-08-16 2014-02-19 深圳市世纪光速信息技术有限公司 数据源选择方法、服务器、终端和系统
CN104156366B (zh) * 2013-05-13 2017-11-21 中国移动通信集团浙江有限公司 一种向移动终端推荐网络应用的方法和网络服务器
CN104090888B (zh) * 2013-12-10 2016-05-11 深圳市腾讯计算机系统有限公司 一种用户行为数据的分析方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819804A (zh) * 2011-06-07 2012-12-12 阿里巴巴集团控股有限公司 一种商品信息的推送方法及设备
CN102831234A (zh) * 2012-08-31 2012-12-19 北京邮电大学 基于新闻内容和主题特征的个性化新闻推荐装置和方法
CN104317865A (zh) * 2014-10-16 2015-01-28 南京邮电大学 一种基于音乐情感特征匹配的社交网络搜索交友方法
CN105430504A (zh) * 2015-11-27 2016-03-23 中国科学院深圳先进技术研究院 基于电视观看日志挖掘的家庭成员结构识别方法与系统
CN106126592A (zh) * 2016-06-20 2016-11-16 北京小米移动软件有限公司 搜索数据的处理方法及装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472879A (zh) * 2019-08-20 2019-11-19 秒针信息技术有限公司 一种资源效果的评估方法、装置、电子设备及存储介质
CN110991296A (zh) * 2019-11-26 2020-04-10 腾讯科技(深圳)有限公司 视频标注方法、装置、电子设备及计算机可读存储介质
CN110991296B (zh) * 2019-11-26 2023-04-07 腾讯科技(深圳)有限公司 视频标注方法、装置、电子设备及计算机可读存储介质
CN111651657A (zh) * 2020-06-04 2020-09-11 深圳前海微众银行股份有限公司 情报监控方法、装置、设备及计算机可读存储介质
CN111651657B (zh) * 2020-06-04 2024-05-24 深圳前海微众银行股份有限公司 情报监控方法、装置、设备及计算机可读存储介质
CN112187746A (zh) * 2020-09-15 2021-01-05 北京明略昭辉科技有限公司 一种设备标识的生成方法及装置
CN113780744A (zh) * 2021-08-13 2021-12-10 唯品会(广州)软件有限公司 货物组合方法、装置及电子设备
CN113780744B (zh) * 2021-08-13 2023-12-29 唯品会(广州)软件有限公司 货物组合方法、装置及电子设备
CN114461699A (zh) * 2022-01-28 2022-05-10 嘉兴职业技术学院 一种基于跨境电商平台的大数据用户挖掘方法
CN114461699B (zh) * 2022-01-28 2024-06-04 嘉兴职业技术学院 一种基于跨境电商平台的大数据用户挖掘方法

Also Published As

Publication number Publication date
CN108304426A (zh) 2018-07-20
CN108304426B (zh) 2021-12-17

Similar Documents

Publication Publication Date Title
WO2018196553A1 (fr) Procédé et appareil d'obtention d'identifiant, support de stockage et dispositif électronique
US11941912B2 (en) Image scoring and identification based on facial feature descriptors
CN112818224B (zh) 信息推荐方法、装置、电子设备及可读存储介质
CN112101994B (zh) 会员权益管理方法、装置、计算机设备和可读存储介质
CN110210882B (zh) 推广位匹配方法和装置、推广信息展示方法和装置
US20140095308A1 (en) Advertisement distribution apparatus and advertisement distribution method
CN113312512B (zh) 训练方法、推荐方法、装置、电子设备以及存储介质
US9704171B2 (en) Methods and systems for quantifying and tracking software application quality
CN113383362B (zh) 用户识别方法及相关产品
US20160171589A1 (en) Personalized application recommendations
CN110727868B (zh) 对象推荐方法、装置和计算机可读存储介质
WO2022252363A1 (fr) Procédé de traitement de données, dispositif informatique et support de stockage lisible
CN103516697B (zh) 网络信息推送方法及其系统
WO2021027595A1 (fr) Procédé et appareil de génération de portrait d'utilisateur, dispositif informatique et support d'enregistrement lisible par ordinateur
WO2018188378A1 (fr) Procédé et dispositif de marquage d'étiquette pour application, terminal et support d'informations lisible par ordinateur
CN113190746B (zh) 推荐模型的评估方法、装置及电子设备
CN108985048B (zh) 模拟器识别方法及相关装置
CN113837318A (zh) 流量判定方案的确定方法和装置、电子设备和存储介质
CN107562432B (zh) 信息处理方法及相关产品
CN112632140A (zh) 课程推荐方法、装置、设备及存储介质
CN113505272B (zh) 基于行为习惯的控制方法和装置、电子设备和存储介质
US20110302174A1 (en) Crowd-sourcing for gap filling in social networks
CN105991583A (zh) 一种游戏应用推荐方法、应用服务器、终端及系统
CN111027065B (zh) 一种勒索病毒识别方法、装置、电子设备及存储介质
CN105096161B (zh) 一种进行信息展示的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18791810

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18791810

Country of ref document: EP

Kind code of ref document: A1