[go: up one dir, main page]

CN116663677B - Model training method and device, electronic equipment and storage medium - Google Patents

Model training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116663677B
CN116663677B CN202310678757.5A CN202310678757A CN116663677B CN 116663677 B CN116663677 B CN 116663677B CN 202310678757 A CN202310678757 A CN 202310678757A CN 116663677 B CN116663677 B CN 116663677B
Authority
CN
China
Prior art keywords
information
wifi
scanning
interest
occurrence information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310678757.5A
Other languages
Chinese (zh)
Other versions
CN116663677A (en
Inventor
冯朝阳
尹卜一
焦恒建
王畔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Original Assignee
Douyin Vision Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Douyin Vision Co Ltd filed Critical Douyin Vision Co Ltd
Priority to CN202310678757.5A priority Critical patent/CN116663677B/en
Publication of CN116663677A publication Critical patent/CN116663677A/en
Priority to PCT/CN2024/096051 priority patent/WO2024251004A1/en
Application granted granted Critical
Publication of CN116663677B publication Critical patent/CN116663677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/021Services related to particular areas, e.g. point of interest [POI] services, venue services or geofences
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本公开提供了一种模型训练方法、装置、设备及介质,该方法包括:获取历史无线保真WiFi扫描信息以及历史WiFi分布信息;历史WiFi扫描信息包括多组WiFi扫描信息,每组WiFi扫描信息包括至少一个第一WiFi信息,历史WiFi分布信息包括每个兴趣点关联的第二WiFi信息;针对每个兴趣点,基于兴趣点关联的第二WiFi信息以及每组WiFi扫描信息中的至少一个第一WiFi信息,生成与兴趣点对应的共现信息;针对每个共现信息,基于共现信息以及与共现信息对应的标签,生成训练样本数据;基于各个训练样本数据,对待训练的模型进行训练,得到训练好的模型;训练好的模型用于基于当前WiFi扫描信息确定对应的兴趣点定位信息。本公开实施例,有利于提升定位结果的准确性。

The present disclosure provides a model training method, device, equipment and medium, the method comprising: obtaining historical wireless fidelity WiFi scanning information and historical WiFi distribution information; the historical WiFi scanning information comprises multiple groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises the second WiFi information associated with each point of interest; for each point of interest, based on the second WiFi information associated with the point of interest and at least one first WiFi information in each group of WiFi scanning information, co-occurrence information corresponding to the point of interest is generated; for each co-occurrence information, based on the co-occurrence information and the label corresponding to the co-occurrence information, training sample data is generated; based on each training sample data, a model to be trained is trained to obtain a trained model; the trained model is used to determine the corresponding point of interest location information based on the current WiFi scanning information. The embodiments of the present disclosure are conducive to improving the accuracy of the positioning results.

Description

Model training method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of positioning technologies, and in particular relates to a model training method, a model training device, electronic equipment and a computer storage medium.
Background
WiFi fingerprint location technology is one of the important technologies for realizing point of interest (Point of Interest, POI) location, and is widely used in the industry because it does not require manual deployment of equipment and has good spatial distribution characteristics. In a common fingerprint positioning technology, a current interest point of a user is determined based on cross information between WiFi information scanned by the user through an electronic device and collected historical WiFi fingerprint information, if error exists in the cross information, the reliability of a final positioning result is affected, and therefore, how to improve the reliability of the positioning result is a current urgent problem to be solved.
Disclosure of Invention
The embodiment of the disclosure at least provides a model training method, a device, electronic equipment and a storage medium, which can improve the credibility of a positioning result.
The embodiment of the disclosure provides a model training method, which comprises the following steps:
The method comprises the steps of acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information, wherein the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point;
Generating co-occurrence information corresponding to the interest points based on second WiFi information associated with the interest points and at least one first WiFi information in each group of WiFi scanning information aiming at each interest point, wherein the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest points;
Generating training sample data according to the co-occurrence information and labels corresponding to the co-occurrence information, wherein the labels corresponding to the co-occurrence information are determined based on preset offline behavior information associated with interest points corresponding to the co-occurrence information;
Training a model to be trained based on the data of each training sample to obtain a trained model, wherein the trained model is used for determining corresponding interest point positioning information based on current WiFi scanning information.
In the embodiment of the disclosure, co-occurrence information corresponding to the interest point is constructed based on the historical WiFi distribution information and the historical WiFi scanning information, and because the co-occurrence information characterizes the scanning state of the historical scanned WiFi information relative to the second WiFi information associated with the interest point, training sample data is generated based on the co-occurrence information to train a model to be trained, so that the memory of the model on the association relationship between the WiFi scanning information and the interest point can be enhanced, a trained model is obtained, further, interest point prediction can be performed based on the trained model, and the reliability of a positioning result is further improved.
In one possible implementation manner, each interest point has corresponding interest point identification information, and the generating co-occurrence information corresponding to the interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each group of WiFi scanning information comprises the following steps:
Generating scanning distribution identification information corresponding to each interest point based on second WiFi information associated with the interest point and the first WiFi information in each group aiming at each interest point;
and generating the co-occurrence information corresponding to the interest point based on the interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point.
In the embodiment of the disclosure, the co-occurrence information corresponding to the interest point is generated based on the unique interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point, so that the co-occurrence information has the characteristics of the interest point and the corresponding associated WiFi characteristics, and when the model is trained based on the co-occurrence information in the subsequent step, the model can learn the association relationship between the interest point and the corresponding WiFi, which is beneficial to improving the precision of the model.
In a possible implementation manner, the first WiFi information includes a first WiFi and a signal strength of the first WiFi, the second WiFi information includes a second WiFi and a signal strength of the second WiFi, and a distribution sequence exists between the second WiFi associated with each interest point, and the distribution sequence is determined by the signal strengths of the respective second WiFi;
Generating, for each interest point, scan distribution identification information corresponding to the interest point based on the second WiFi information associated with the interest point and the first WiFi information in each group, including:
Generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group;
And splicing all sub-scanning distribution identification information according to the distribution sequence among the second WiFi related to the interest points to generate the scanning distribution identification information.
In the embodiment of the disclosure, since each interest point is associated with at least one second WiFi, corresponding sub-scanning distribution identification information can be generated for each second WiFi, and then each sub-scanning distribution identification information is spliced to obtain scanning distribution identification information, so that accuracy of the scanning distribution identification information can be improved, accuracy of subsequent generated training sample data is improved, and accuracy of model training is improved.
In one possible implementation manner, the sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal strength zone bits, wherein different signal strength zone bits are used for representing different signal strengths, and the generating sub-scanning distribution identification information for each second WiFi associated with each interest point based on the second WiFi and each first WiFi in each group comprises:
For each second WiFi, determining that the value of the scanning flag bit is 1 when there is a target first WiFi with the same media access control address MAC address as the second WiFi;
Determining a target signal strength zone bit corresponding to the signal strength of the target first WiFi from the signal strength zone bits, determining the value of the target signal strength zone bit as 1, and determining the values of other signal strength zone bits except the target signal strength zone bit as 0;
And generating the sub-scanning distribution identification information based on the value of the scanning zone bit and the value of the signal strength zone bit.
In the embodiment of the disclosure, the sub-scanning distribution identification information is determined by determining the value of the scanning zone bit and the value of the signal strength zone bit, which is beneficial to improving the accuracy of the sub-scanning distribution identification information.
In one possible implementation, the format of the co-occurrence information is a character string format, and the generating training sample data for each co-occurrence information based on the co-occurrence information and a label corresponding to the co-occurrence information includes:
Performing feature processing on the co-occurrence information of the character string format aiming at each piece of co-occurrence information to generate a feature vector corresponding to the co-occurrence information;
acquiring preset offline behavior information associated with the interest points corresponding to the co-occurrence information, and determining a true value corresponding to the tag based on the preset offline behavior information;
and generating the training sample data based on the feature vector and the true value corresponding to the label.
In the embodiment of the disclosure, the co-occurrence information in the character string format is converted into the feature vector, so that model training of subsequent steps is facilitated. In addition, the true value of the feature vector is determined based on the preset offline behavior information, and the true value is calibrated for the feature vector, so that the accuracy of the training sample data can be improved.
In a possible implementation manner, the determining the true value corresponding to the tag based on the preset offline behavior information includes:
in case of preset offline behavior, determining the true value corresponding to the label as 1, or
And under the condition that the preset offline behavior does not occur, determining the true value corresponding to the label as 0.
In the embodiment of the disclosure, if the preset offline behavior instruction is performed in the interest point, the true value corresponding to the tag is determined to be 1, and if the preset offline behavior instruction is not performed in the interest point, the true value corresponding to the tag is determined to be 0, so that the accuracy of the true value determination can be improved, and the accuracy of the training sample data can be improved.
In one possible implementation manner, the feature processing is performed on the co-occurrence information in the character string format for each co-occurrence information, so as to generate a feature vector corresponding to the co-occurrence information, and the feature vector includes:
numbering each co-occurrence information of the character string format according to a preset numbering mode, and constructing a feature dictionary based on the numbers corresponding to the co-occurrence information respectively, wherein the feature dictionary is used for converting the co-occurrence information of the character string format into feature vectors;
And aiming at the co-occurrence information of each character string format, performing single-heat encoding processing on the co-occurrence information of the character string format based on the feature dictionary, and generating feature vectors corresponding to the co-occurrence information.
In the embodiment of the disclosure, the co-occurrence information of the character string format is subjected to feature conversion through the constructed feature dictionary, so that the efficiency of subsequent model training is improved. In addition, the corresponding feature vector can be represented by one state code through the single-hot coding, so that the logic of feature conversion can be simplified through the single-hot coding mode.
In one possible implementation manner, the performing single-hot encoding processing on the co-occurrence information of the character string format based on the feature dictionary to generate a feature vector corresponding to the co-occurrence information includes:
Determining the number of numbers contained in the feature dictionary, and creating a zero vector with a vector length of the number;
and modifying the value of a target index bit which is the same as the number in the zero vector into 1 according to the number corresponding to the co-occurrence information in the feature dictionary for each co-occurrence information, and generating a feature vector corresponding to the co-occurrence information.
In the embodiment of the disclosure, for each co-occurrence information, the target index bit in the zero vector is modified based on the number of the co-occurrence information in the feature dictionary, so that the accuracy of the feature vector can be improved.
In a possible implementation manner, the training the model to be trained based on each training sample data to obtain a trained model includes:
inputting each training sample data into the model to be trained to obtain a prediction result corresponding to each training sample data;
Based on the prediction result corresponding to each training sample data and the label of each training sample data, adjusting model parameters of the model to be trained;
repeating the above process until the training result meets the preset requirement, and obtaining the trained model.
In the embodiment of the disclosure, the model is subjected to supervised training based on the prediction result corresponding to each training sample data and the label of each training sample data, so that the performance of model training can be improved, and the prediction precision of the model is further improved.
The embodiment of the disclosure provides a method for positioning an interest point, which comprises the following steps:
acquiring current WiFi scanning information of current equipment and associated WiFi information corresponding to each interest point corresponding to an area where the current equipment is located;
generating current co-occurrence information based on the current WiFi scanning information and associated WiFi information corresponding to the interest points aiming at each interest point;
And inputting each piece of current co-occurrence information into a trained model to obtain the interest point positioning information of the current equipment, wherein the trained model is obtained through the embodiment of the model training method.
In the embodiment of the disclosure, the interest point positioning information of the current equipment is determined based on the trained model, so that the accuracy of the interest point positioning can be improved.
The embodiment of the disclosure provides a model training device, comprising:
The information acquisition module is used for acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information, wherein the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point;
The information generation module is used for generating co-occurrence information corresponding to each interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each group of WiFi scanning information, wherein the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest point;
The system comprises a sample generation module, a training sample data generation module, a detection module and a detection module, wherein the sample generation module is used for generating training sample data aiming at each co-occurrence information based on the co-occurrence information and a label corresponding to the co-occurrence information;
The model training module is used for training the model to be trained based on the training sample data to obtain a trained model, and the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information.
In one possible implementation manner, each interest point has corresponding interest point identification information, and the information generation module is specifically configured to:
Generating scanning distribution identification information corresponding to each interest point based on second WiFi information associated with the interest point and the first WiFi information in each group aiming at each interest point;
and generating the co-occurrence information corresponding to the interest point based on the interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point.
In one possible implementation manner, the first WiFi information includes a first WiFi and a signal strength of the first WiFi, the second WiFi information includes a second WiFi and a signal strength of the second WiFi, a distribution sequence exists between the second WiFi associated with each interest point, the distribution sequence is determined by the signal strengths of the second WiFi, and the information generating module is specifically configured to:
Generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group;
And splicing all sub-scanning distribution identification information according to the distribution sequence among the second WiFi related to the interest points to generate the scanning distribution identification information.
In one possible implementation manner, the sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal strength zone bits, different signal strength zone bits are used for representing different signal strengths, and the information generating module is specifically configured to:
For each second WiFi, determining that the value of the scanning flag bit is 1 when there is a target first WiFi with the same media access control address MAC address as the second WiFi;
Determining a target signal strength zone bit corresponding to the signal strength of the target first WiFi from the signal strength zone bits, determining the value of the target signal strength zone bit as 1, and determining the values of other signal strength zone bits except the target signal strength zone bit as 0;
And generating the sub-scanning distribution identification information based on the value of the scanning zone bit and the value of the signal strength zone bit.
In one possible implementation, the format of the co-occurrence information is a character string format, and the sample generation module is specifically configured to:
Performing feature processing on the co-occurrence information of the character string format aiming at each piece of co-occurrence information to generate a feature vector corresponding to the co-occurrence information;
acquiring preset offline behavior information associated with the interest points corresponding to the co-occurrence information, and determining a true value corresponding to the tag based on the preset offline behavior information;
and generating the training sample data based on the feature vector and the true value corresponding to the label.
In one possible implementation manner, the sample generation module is specifically configured to:
in case of preset offline behavior, determining the true value corresponding to the label as 1, or
And under the condition that the preset offline behavior does not occur, determining the true value corresponding to the label as 0.
In one possible implementation manner, the sample generation module is specifically configured to:
numbering each co-occurrence information of the character string format according to a preset numbering mode, and constructing a feature dictionary based on the numbers corresponding to the co-occurrence information respectively, wherein the feature dictionary is used for converting the co-occurrence information of the character string format into feature vectors;
And aiming at the co-occurrence information of each character string format, performing single-heat encoding processing on the co-occurrence information of the character string format based on the feature dictionary, and generating feature vectors corresponding to the co-occurrence information.
In one possible implementation manner, the sample generation module is specifically configured to:
Determining the number of numbers contained in the feature dictionary, and creating a zero vector with a vector length of the number;
and modifying the value of a target index bit which is the same as the number in the zero vector into 1 according to the number corresponding to the co-occurrence information in the feature dictionary for each co-occurrence information, and generating a feature vector corresponding to the co-occurrence information.
In one possible implementation, the model training module is specifically configured to:
Inputting each training sample data into the model to be trained to obtain a prediction result corresponding to each training sample;
Based on the prediction result corresponding to each training sample and the label of each training sample data, adjusting model parameters of the model to be trained;
repeating the above process until the training result meets the preset requirement, and obtaining the trained model.
The embodiment of the disclosure provides an interest point positioning device, which comprises:
the acquisition module is used for acquiring current WiFi scanning information of the current equipment and associated WiFi information corresponding to each interest point corresponding to the area where the current equipment is located;
The generating module is used for generating current co-occurrence information according to the current WiFi scanning information and the associated WiFi information corresponding to the interest points aiming at each interest point;
The positioning module is used for inputting the current co-occurrence information into the trained model to obtain the interest point positioning information of the current equipment, wherein the trained model is obtained through the embodiment of the model training method.
The embodiment of the disclosure also provides an electronic device, which comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when the electronic device is running, the processor and the memory are communicated through the bus, and the machine-readable instructions are executed by the processor to perform the steps of the model training method or the steps of the interest point positioning method in any one of the possible implementation manners.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the model training method described in any one of the possible implementations above or the steps of the point of interest positioning method described above.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a model training method provided by an embodiment of the present disclosure;
FIG. 2 illustrates a flow chart of a method for generating co-occurrence information provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a generation process of sub-scan distribution identification information provided by an embodiment of the present disclosure;
FIG. 4 illustrates a flowchart of a method for generating training data samples provided by embodiments of the present disclosure;
FIG. 5 illustrates a flow chart of a feature processing method for co-occurrence information provided by an embodiment of the present disclosure;
FIG. 6 illustrates a flow chart of a method of interest point location provided by an embodiment of the present disclosure;
FIG. 7 illustrates a schematic diagram of a model training apparatus provided by an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a point of interest locating device according to an embodiment of the present disclosure;
fig. 9 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The term "and/or" is used herein to describe only one relationship, and means that three relationships may exist, for example, A and/or B, and that three cases exist, A alone, A and B together, and B alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
According to research, in the common fingerprint positioning technology, the current interest point of the user is usually determined based on the cross information between the WiFi information scanned by the electronic equipment and the WiFi fingerprint information collected by the history, and the accuracy degree of the cross information calculation directly influences the reliability of the final positioning result, so that the accuracy degree of the cross information is improved, and the reliability of the positioning result is further improved.
In the fields of personalized information recommendation, information scanning service and online advertising, a Click-through Rate (CTR) estimation model is one of important technologies and is used for learning and predicting feedback information of a user, wherein the feedback information of the user can be behavior information such as clicking, collecting or purchasing performed by the user. The CTR model realizes the functions of information recommendation and the like through the memory capacity of the CTR model, wherein the memory capacity refers to the capacity of the model to directly learn and utilize the 'co-occurrence frequency' of the request and the historical fingerprint in the historical data. Generally, collaborative filtering models, logistic regression models and other models have stronger memory capacity, and because the models have simple structures, historical data can often directly influence recommended results, namely, the models can learn the distribution characteristics of the historical data, and the results are predicted by utilizing the memory of the models.
Based on the research, the disclosure provides a model training method, device, electronic equipment and storage medium, wherein the historical WiFi scanning information comprises multiple groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, the historical WiFi distribution information comprises second WiFi information associated with each interest point, next, for each interest point, co-occurrence information corresponding to the interest point is generated based on the second WiFi information associated with the interest point and at least one first WiFi information in each group of WiFi scanning information, the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest point, then, for each co-occurrence information, training sample data is generated based on the co-occurrence information and labels corresponding to the co-occurrence information, the labels corresponding to the co-occurrence information are determined based on preset under-line behavior information associated with the interest point corresponding to the co-occurrence information, finally, the co-occurrence information is used for determining a training sample based on the corresponding training model, and the current training model is well based on the training sample is obtained.
In the embodiment of the disclosure, corresponding co-occurrence information is constructed based on the historical WiFi distribution information and the historical WiFi scanning information, and because the co-occurrence information characterizes the scanning state of the WiFi information scanned by each first user through the electronic device relative to at least one WiFi information covering the interest point, training sample data is generated based on the co-occurrence information to train the model, so that the memory of the model on the association relationship between the WiFi scanning information and the interest point can be enhanced, a trained model is obtained, further, interest point prediction can be performed based on the trained model, and the reliability of the positioning result is further improved.
For the convenience of understanding the present embodiment, first, a detailed description will be given of an execution body of the model training method provided in the embodiment of the present disclosure. The execution subject of the model training method provided by the embodiment of the disclosure is electronic equipment. In this embodiment, the electronic device is a server, and the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, cloud storage, big data, and an artificial intelligence platform. In other embodiments, the electronic device may also be a terminal device. The terminal device may be a mobile device, a user terminal, a handheld device, a computing device, a wearable device, or the like. The model training method may be implemented by a processor invoking computer readable instructions stored in a memory.
The model training method provided by the embodiment of the application is described in detail below with reference to the accompanying drawings. Referring to fig. 1, a flowchart of a model training method according to an embodiment of the disclosure is shown, where the method includes steps S101 to S104, where:
and S101, acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information, wherein the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point.
The historical WiFi scan information may be WiFi information scanned by an electronic device, that is, may refer to multiple sets of WiFi scan information scanned by different electronic devices, and since WiFi signals have corresponding coverage areas, each set of WiFi scan information includes at least one first WiFi information. The first WiFi information comprises first WiFi and signal strength of the first WiFi, and each first WiFi corresponds to one interest point.
The electronic device may be a terminal device, such as a smart phone, a tablet computer, or a smart watch, which is not limited herein. It should be appreciated that the historical WiFi information is the WiFi information collected during the historical time (e.g., the previous month).
The historical WiFi distribution information comprises second WiFi information associated with each interest point, the second WiFi information comprises second WiFi and signal intensity of the second WiFi, a distribution sequence exists among the second WiFi associated with each interest point, and the distribution sequence is determined by the signal intensity of each second WiFi.
The historical WiFi fingerprint distribution information may be WiFi distribution information under a point of interest (POI) dimension constructed based on historical WiFi scan information, that is, the historical WiFi fingerprint distribution information may be WiFi distribution constructed according to multiple sets of WiFi scan information scanned in a historical time for each point of interest, and because signal strengths of the second WiFi are different, for each point of interest, a corresponding distribution sequence exists between the second WiFi associated with the point of interest.
For example, for the point of interest a, the corresponding second WiFi may include W1, W2, and W3, where the distances of W1, W2, and W3 with respect to the point of interest a are different, so that the signal strengths of the second WiFi associated with the point of interest a are also different, for example, the signal strengths between W1, W2, and W3 are W1> W2> W3, and then the distribution order between the second WiFi may be W1, W2, and W3.
In the embodiment of the disclosure, the interest point may refer to one store in a target location, which may be a mall, an office building, or the like, and the interest point may be one store in the mall, for example. Each interest point has corresponding unique interest point identification information, and the interest point identification information can be identification information formed by numbers, letters and the like.
And S102, generating co-occurrence information corresponding to the interest points based on the second WiFi information associated with the interest points and at least one first WiFi information in each group of WiFi scanning information aiming at each interest point, wherein the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest points.
Wherein, the scan state may refer to a scanned state and an unscanned state.
Here, the second WiFi information associated with each point of interest may be compared with at least one first WiFi information in each set of WiFi scan information, so it may be determined whether the first WiFi information identical to the second WiFi information associated with the point of interest exists in the at least one first WiFi information, and further it may be determined a scan state of each first WiFi information relative to the second WiFi information associated with the point of interest.
Optionally, for step S102, when generating co-occurrence information corresponding to each interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each set of WiFi scan information, referring to fig. 2, the following steps S1021 to S1022 may be included:
s1021, for each interest point, generating scanning distribution identification information corresponding to the interest point based on the second WiFi information associated with the interest point and the first WiFi information in each group.
In this step, for each interest point, the second WiFi information associated with the interest point and the first WiFi information in each set of WiFi scan information may be compared, so as to generate scan distribution identification information corresponding to the interest point.
Specifically, when generating, for each point of interest, scan distribution identification information corresponding to the point of interest based on the second WiFi information associated with the point of interest and the first WiFi information in each group, the method may include the following steps (1) - (2):
(1) For each second WiFi associated with each point of interest, generating sub-scan distribution identification information based on the second WiFi and each first WiFi in each group.
The sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal strength zone bits, and the scanning zone bit is used for representing whether a corresponding second WiFi is scanned or not. The signal strength flag bit is used for representing signal strength, wherein different signal strength flag bits correspond to different signal strengths, and in this embodiment, the signal strength flag bit comprises 8 bits, for example, -70dBm to-60 dBm corresponds to signal strength flag bit 0, -60dBm to-50 dBm corresponds to signal strength flag bit 1.
Specifically, for each second WiFi, a scanning flag bit may be determined according to whether a first WiFi in each set of first WiFi information is the same as the second WiFi, a signal strength flag bit is determined based on a signal strength of each first WiFi in the first WiFi information, and sub-scanning distribution identification information is generated based on the scanning flag bit and the signal strength flag bit. Specifically, the method comprises the following steps (a) - (c):
(a) For each second WiFi, determining that the value of the scanning flag bit is 1 when there is a target first WiFi with the same media access control address MAC address as the second WiFi.
(B) And determining a target signal strength zone bit corresponding to the signal strength of the target first WiFi from the signal strength zone bits, determining the value of the target signal strength zone bit as1, and determining the values of the other signal strength zone bits except the target signal strength zone bit as 0.
(C) And generating the sub-scanning distribution identification information based on the value of the scanning zone bit and the value of the signal strength zone bit.
In this embodiment, when determining the sub-scan distribution identification information according to the value of the scan flag bit and the value of the signal strength flag bit, the determination may be performed according to a generation formula of the sub-scan distribution identification information, as shown in formula (1):
IDn = Tagscan*(28+2I) (1)
The ID n is sub-scanning distribution identification information, the scanning flag bit Tag scan is used to indicate that a target first WiFi with the same address as the MAC address of the second WiFi exists in the first WiFi information, if the target first WiFi exists, the value of the scanning flag bit is determined to be 1, if the target first WiFi does not exist, the value of the scanning flag bit is determined to be 0;I, the signal strength of the target first WiFi falls into an index value of a signal sub-bucket, and the signal sub-bucket refers to a signal range.
For example, please refer to fig. 3, which is a schematic diagram illustrating a process for generating sub-scan distribution identification information according to an embodiment of the present disclosure. As shown in fig. 3, the point of interest POI-1 is distributed with 5 second WiFi (W1, W2, W3, W4, W5), for each second WiFi (taking W1 as an example), comparing W1 with each first WiFi in each set of first WiFi information, if there is a target first WiFi with the same MAC address as W1, determining the value of the scanning flag bit to be 1, then determining the target signal strength flag bit corresponding to the signal strength of the target first WiFi according to the signal strength of the target first WiFi, assigning the value of the target signal strength flag bit to be 1, and assigning the values of other signal strength flag bits to be 0, so as to obtain a binary code 100010000 with a length of 9, and further converting the binary code to be a decimal integer, thereby obtaining the sub-scanning distribution identification information 272.
Similarly, it may be determined for W2 that the corresponding sub-scan distribution identification information bit 000, the sub-scan distribution identification information corresponding to W3 is 272, the sub-scan distribution identification information corresponding to W4 is 258, and the sub-scan distribution identification information corresponding to W5 is 288.
(2) And splicing all sub-scanning distribution identification information according to the distribution sequence among the second WiFi related to the interest points to generate the scanning distribution identification information.
Here, after obtaining the sub-scan distribution identification information, the sub-scan distribution identification information may be spliced according to the distribution sequence of each second WiFi, as shown in fig. 3, where the obtained sub-scan distribution identification information is 272, 000, 272, 258, and 288, and the sub-scan distribution identification information is spliced to obtain the scan distribution identification information 272000272258288.
S1022, generating the co-occurrence information corresponding to the interest point based on the interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point.
It can be understood that after the scan distribution identification information is obtained, co-occurrence information corresponding to the interest point can be generated based on the interest point identification information and the scan distribution identification information corresponding to the interest point, and specifically, the interest point identification information and the scan distribution identification information corresponding to the interest point can be spliced to generate the co-occurrence information, where the format of the co-occurrence information is a character string format.
For example, if the interest point identification information is 22535659086281011 and the scan distribution identification information corresponding to the interest point is 272000272258288, the co-occurrence information is 272000272258288_22535659086281011.
It should be noted that, the above example is described with respect to one point of interest and a set of first WiFi scan information, and therefore, in the process of actually generating co-occurrence information, for each point of interest, multiple co-occurrence information may be determined based on multiple sets of first WiFi scan information.
And S103, generating training sample data according to each piece of co-occurrence information based on the co-occurrence information and the labels corresponding to the co-occurrence information, wherein the labels corresponding to the co-occurrence information are determined based on the preset offline behavior information associated with the interest points corresponding to the co-occurrence information.
The preset offline behavior information may refer to whether a preset offline behavior occurs, and by way of example, the preset offline behavior may refer to an offline consumption behavior, such as a coupon verification, etc.
It should be noted that, when generating the training sample data, if no offline behavior occurs in any of the points of interest corresponding to the first WiFi, the corresponding co-occurrence information will not be used to generate the training sample data, for example, the first WiFi in the first WiFi scan information includes W11, W12, and/or W19, which respectively corresponds to the points of interest POI1, POI2, and/or POI9, and no offline behavior occurs in the points of interest POI0, POI2, and/or POI9, and the corresponding co-occurrence information will not be used to generate the training sample data.
Optionally, for step S103, when generating training sample data for each co-occurrence information based on the co-occurrence information and the tag corresponding to the co-occurrence information, please refer to fig. 4, the following steps S1031 to S1033 may be included:
S1031, carrying out feature processing on the co-occurrence information of the character string format aiming at each piece of co-occurrence information, and generating feature vectors corresponding to the co-occurrence information.
It should be appreciated that since co-occurrence information is in a string format, it needs to be converted into corresponding feature vectors in order to facilitate model training in subsequent steps.
Optionally, when performing feature processing on the co-occurrence information of the character string format for each co-occurrence information to generate a feature vector corresponding to the co-occurrence information, please refer to fig. 5, including the following S10311-S10312:
S10311, numbering each co-occurrence information of the character string format according to a preset numbering mode, and constructing a feature dictionary based on the numbers corresponding to the co-occurrence information, wherein the feature dictionary is used for converting the co-occurrence information of the character string format into feature vectors.
Specifically, the co-occurrence information may be uniformly numbered for each co-occurrence information, so that each co-occurrence information uniquely corresponds to an integer. For example, please refer to table 1, which shows the correspondence between co-occurrence information and numbers.
TABLE 1
Co-occurrence information Feature dictionary
272000272258288_22535659086281011 0
272000256258288_22535659086282098 1
272288272258288_22535659086281035 2
... ...
272000272258288_22535659086098765 n
Note that, in generating the feature dictionary, co-occurrence information of some low frequency occurrences (for example, occurrence times less than 5 times) will not be numbered.
S10312, aiming at the co-occurrence information of each character string format, performing single-heat encoding processing on the co-occurrence information of the character string format based on the feature dictionary, and generating feature vectors corresponding to the co-occurrence information.
Wherein the feature vector is a high-dimensional sparse feature vector.
After the feature dictionary is generated, the co-occurrence information of the character string format can be encoded, and in the embodiment of the disclosure, a single-hot encoding mode is adopted, wherein the single-hot encoding is also called One-hot encoding, and the method is to encode N states by using N-bit state registers, each state has independent register bits, and only One register bit is valid at any time.
Optionally, when performing single-hot encoding processing on the co-occurrence information in the character string format based on the feature dictionary to generate feature vectors corresponding to the co-occurrence information, the number of numbers contained in the feature dictionary may be determined first, a zero vector with a vector length of the number may be created, and then, for each co-occurrence information, according to the number corresponding to the co-occurrence information in the feature dictionary, a value of a target index bit, which is the same as the number, in the zero vector is modified to be 1, so as to generate feature vectors corresponding to the co-occurrence information.
For example, referring to table 2, a feature vector conversion process obtained by performing feature conversion on co-occurrence information based on a feature dictionary is shown.
TABLE 2
As can be seen from table 2, if the number of numbers in the feature dictionary is 9, a zero vector having a vector length of 9 is created, and for each co-occurrence information (for example, 272288272258288_22535659086281035), the number corresponding to the co-occurrence information is determined to be 2 in the feature dictionary, and then the value of the register bit having the index bit of 2 in the zero vector is modified to be 1, so that a feature vector 001000000 corresponding to the co-occurrence information is generated.
S1032, obtaining preset offline behavior information associated with the interest point corresponding to the co-occurrence information, and determining a true value corresponding to the tag based on the preset offline behavior information.
Wherein, under the condition of the occurrence of the preset offline behavior, the true value corresponding to the label is determined to be 1,
And under the condition that the preset offline behavior does not occur, determining the true value corresponding to the label as 0.
And S1033, generating the training sample data based on the feature vector and the true value corresponding to the label.
Thus, after the feature vectors are determined, the true values of each feature vector determined in the foregoing embodiments may be calibrated to obtain training sample data.
And S104, training the model to be trained based on the training sample data to obtain a trained model, wherein the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information.
The model to be trained may refer to a logistic regression model, and in other embodiments, the model to be trained may also be other models, which are not limited herein.
It can be understood that after the training sample data is obtained, the model to be trained can be trained based on the training sample data (or the training sample data is fitted based on the logistic regression model to be trained), so as to obtain a trained model, and here, each training sample data has a corresponding label, so that the model to be trained can be subjected to supervised training.
In the embodiment of the disclosure, corresponding co-occurrence information is constructed based on the historical WiFi distribution information and the historical WiFi scanning information, and because the co-occurrence information characterizes the scanning state of each scanned set of first WiFi scanning information relative to at least one second WiFi information associated with the interest point, training sample data is generated based on the co-occurrence information to train the model, so that the memory of the model on the association relationship between the WiFi scanning information and the interest point can be enhanced, a trained model is obtained, further, interest point prediction can be performed based on the trained model, and the reliability of the positioning result is further improved.
Optionally, when training a model to be trained based on training sample data to obtain a trained model, each training sample data may be input into the model to be trained to obtain a prediction result corresponding to each training sample, then, based on the prediction result corresponding to each training sample data and the label of each training sample data, a model parameter of the model is adjusted, specifically, a loss function may be preset, a loss value between the prediction result corresponding to each training sample data and the label of each training sample data is calculated, and the model parameter of the model is adjusted based on the loss value, so, the above process is repeated until the training result meets the preset requirement, and the trained model is obtained.
Referring to fig. 6, a method for locating an interest point according to an embodiment of the disclosure includes S601 to S603:
s601, acquiring current WiFi scanning information of current equipment and associated WiFi information corresponding to each interest point corresponding to an area where the current equipment is located.
For example, the current WiFi scan information may be obtained by the current electronic device. The current WiFi scanning information comprises at least one WiFi currently scanned and the signal strength of each WiFi.
Here, the area where the current device is located may be determined according to the location information of the current device, and the associated WiFi information corresponding to each interest point may be pre-constructed.
S602, for each interest point, generating current co-occurrence information based on the current WiFi scanning information and associated WiFi information corresponding to the interest point.
In the step, the current co-occurrence information is obtained by combining the current WiFi scanning information with the interest point identification information of each interest point respectively and performing One-hot encoding (One-hot) on the combined information, wherein the current co-occurrence information is in a vector form.
And S603, inputting each piece of current co-occurrence information into the trained model to obtain the interest point positioning information of the current equipment.
Wherein the trained model is obtained by the model training method of any one of the above.
It can be understood that after each piece of current co-occurrence information is input to the trained model, the probability corresponding to each interest point corresponding to the area where the current device is located can be output, so that in the process of performing POI location, only the obtained current co-occurrence information in the vector form is input to the trained model, the model can output the probability corresponding to each interest point, that is, according to the probability corresponding to each interest point, the interest point location information of the current device can be determined, and in particular, the interest point corresponding to the maximum probability can be determined as the interest point where the current device is located.
In some embodiments, after determining the point of interest in which the current device is located, information recommendations (e.g., coupon recommendations, merchandise recommendations, online advertising delivery, etc.) may be made by the user who wants to use the current device based on the point of interest in which the current device is located.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiment of the disclosure further provides a model training device corresponding to the model training method, and since the principle of solving the problem by the device in the embodiment of the disclosure is similar to that of the model training method in the embodiment of the disclosure, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 7, a schematic diagram of a model training apparatus 700 according to an embodiment of the disclosure is provided, where the apparatus includes:
The information acquisition module 701 is configured to acquire historical wireless fidelity WiFi scan information and historical WiFi distribution information, where the historical WiFi scan information includes multiple sets of WiFi scan information, each set of WiFi scan information includes at least one first WiFi information, and the historical WiFi distribution information includes second WiFi information associated with each interest point;
The information generation module 702 is configured to generate co-occurrence information corresponding to each interest point based on second WiFi information associated with the interest point and at least one first WiFi information in each set of WiFi scan information, where the co-occurrence information is used to characterize a scan state of each first WiFi information in each set relative to the second WiFi information associated with the interest point;
The sample generation module 703 is configured to generate training sample data for each co-occurrence information based on the co-occurrence information and a tag corresponding to the co-occurrence information, where the tag corresponding to the co-occurrence information is determined based on preset offline behavior information associated with an interest point corresponding to the co-occurrence information;
The model training module 704 is configured to train a model to be trained based on each training sample data to obtain a trained model, where the trained model is configured to determine corresponding interest point positioning information based on current WiFi scan information.
In one possible implementation, each interest point has corresponding interest point identification information, and the information generating module 702 is specifically configured to:
Generating scanning distribution identification information corresponding to each interest point based on second WiFi information associated with the interest point and the first WiFi information in each group aiming at each interest point;
and generating the co-occurrence information corresponding to the interest point based on the interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point.
In one possible implementation manner, the first WiFi information includes a first WiFi and a signal strength of the first WiFi, the second WiFi information includes a second WiFi and a signal strength of the second WiFi, a distribution order exists between the second WiFi associated with each interest point, the distribution order is determined by the signal strengths of the second WiFi, and the information generating module 702 is specifically configured to:
Generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group;
And splicing all sub-scanning distribution identification information according to the distribution sequence among the second WiFi related to the interest points to generate the scanning distribution identification information.
In one possible implementation manner, the sub-scan distribution identification information corresponds to one scan flag bit and a plurality of signal strength flag bits, different signal strength flag bits are used to characterize different signal strengths, and the information generating module 702 is specifically configured to:
For each second WiFi, determining that the value of the scanning flag bit is 1 when there is a target first WiFi with the same media access control address MAC address as the second WiFi;
Determining a target signal strength zone bit corresponding to the signal strength of the target first WiFi from the signal strength zone bits, determining the value of the target signal strength zone bit as 1, and determining the values of other signal strength zone bits except the target signal strength zone bit as 0;
And generating the sub-scanning distribution identification information based on the value of the scanning zone bit and the value of the signal strength zone bit.
In one possible implementation, the format of the co-occurrence information is a character string format, and the sample generation module 703 is specifically configured to:
Performing feature processing on the co-occurrence information of the character string format aiming at each piece of co-occurrence information to generate a feature vector corresponding to the co-occurrence information;
acquiring preset offline behavior information associated with the interest points corresponding to the co-occurrence information, and determining a true value corresponding to the tag based on the preset offline behavior information;
and generating the training sample data based on the feature vector and the true value corresponding to the label.
In one possible implementation, the sample generation module 703 is specifically configured to:
in case of preset offline behavior, determining the true value corresponding to the label as 1, or
And under the condition that the preset offline behavior does not occur, determining the true value corresponding to the label as 0.
In one possible implementation, the sample generation module 703 is specifically configured to:
numbering each co-occurrence information of the character string format according to a preset numbering mode, and constructing a feature dictionary based on the numbers corresponding to the co-occurrence information respectively, wherein the feature dictionary is used for converting the co-occurrence information of the character string format into feature vectors;
And aiming at the co-occurrence information of each character string format, performing single-heat encoding processing on the co-occurrence information of the character string format based on the feature dictionary, and generating feature vectors corresponding to the co-occurrence information.
In one possible implementation, the sample generation module 703 is specifically configured to:
Determining the number of numbers contained in the feature dictionary, and creating a zero vector with a vector length of the number;
and modifying the value of a target index bit which is the same as the number in the zero vector into 1 according to the number corresponding to the co-occurrence information in the feature dictionary for each co-occurrence information, and generating a feature vector corresponding to the co-occurrence information.
In one possible implementation, the model training module 704 is specifically configured to:
Inputting each training sample data into the model to be trained to obtain a prediction result corresponding to each training sample;
Based on the prediction result corresponding to each training sample and the label of each training sample data, adjusting model parameters of the model to be trained;
repeating the above process until the training result meets the preset requirement, and obtaining the trained model.
Referring to fig. 8, an interest point positioning device provided in an embodiment of the present disclosure includes:
an obtaining module 801, configured to obtain current WiFi scan information of a current device and associated WiFi information corresponding to each interest point corresponding to an area where the current device is located;
A generating module 802, configured to generate, for each interest point, current co-occurrence information based on the current WiFi scan information and associated WiFi information corresponding to the interest point;
and the positioning module 803 is used for inputting each piece of current co-occurrence information into the trained model to obtain the interest point positioning information of the current equipment, wherein the trained model is obtained through the embodiment of the model training method.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
Based on the same technical concept, the embodiment of the disclosure also provides electronic equipment. Referring to fig. 9, a schematic structural diagram of an electronic device 900 according to an embodiment of the disclosure includes a processor 901, a memory 902, and a bus 903. The memory 902 is used for storing execution instructions, and includes a memory 9021 and an external memory 9022, where the memory 9021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 901 and data exchanged with the external memory 9022 such as a hard disk, and the processor 901 exchanges data with the external memory 9022 through the memory 9021.
In the embodiment of the present application, the memory 902 is specifically configured to store application program codes for executing the solution of the present application, and the processor 901 controls the execution. That is, when the electronic device 900 is running, communication between the processor 901 and the memory 902 is via the bus 903, such that the processor 901 executes the application code stored in the memory 902, thereby performing the methods described in any of the foregoing embodiments.
The Memory 902 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
Processor 901 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc., or may be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 900. In other embodiments of the application, electronic device 900 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of model training in the method embodiments described above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present disclosure further provide a computer program product carrying program code, where the program code includes instructions for performing the steps of model training in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.
It should be noted that the foregoing embodiments are merely specific implementations of the disclosure, and are not intended to limit the scope of the disclosure, and although the disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features described in the foregoing embodiments may be made or equivalents may be substituted for those within the scope of the disclosure without departing from the spirit and scope of the technical aspects of the embodiments of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A method of model training, comprising:
The method comprises the steps of acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information, wherein the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point;
Generating co-occurrence information corresponding to the interest points based on second WiFi information associated with the interest points and at least one first WiFi information in each group of WiFi scanning information aiming at each interest point, wherein the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest points;
Generating training sample data according to the co-occurrence information and labels corresponding to the co-occurrence information, wherein the labels corresponding to the co-occurrence information are determined based on preset offline behavior information associated with interest points corresponding to the co-occurrence information;
training a model to be trained based on the data of each training sample to obtain a trained model, wherein the trained model is used for determining corresponding interest point positioning information based on current WiFi scanning information;
The first WiFi information comprises a first WiFi and the signal intensity of the first WiFi, and the second WiFi information comprises a second WiFi and the signal intensity of the second WiFi.
2. The method of claim 1, wherein generating co-occurrence information corresponding to each point of interest based on the second WiFi information associated with the point of interest and at least one first WiFi information in each set of WiFi scan information, comprises:
Generating scanning distribution identification information corresponding to each interest point based on second WiFi information associated with the interest point and the first WiFi information in each group aiming at each interest point;
and generating the co-occurrence information corresponding to the interest point based on the interest point identification information of the interest point and the scanning distribution identification information corresponding to the interest point.
3. The method of claim 2, wherein there is a distribution order between the second WiFi associated with each point of interest, the distribution order being determined by the signal strengths of the respective second WiFi;
Generating, for each interest point, scan distribution identification information corresponding to the interest point based on the second WiFi information associated with the interest point and the first WiFi information in each group, including:
Generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group;
splicing all sub-scanning distribution identification information according to the distribution sequence among the second WiFi related to the interest points to generate the scanning distribution identification information;
The sub-scanning distribution identification information corresponds to one scanning zone bit and a plurality of signal intensity zone bits.
4. The method of claim 3, wherein the generating sub-scan distribution identification information for each second WiFi associated with each point of interest based on the second WiFi and each first WiFi in each group comprises:
For each second WiFi, determining that the value of the scanning flag bit is 1 when there is a target first WiFi with the same media access control address MAC address as the second WiFi;
Determining a target signal strength zone bit corresponding to the signal strength of the target first WiFi from the signal strength zone bits, determining the value of the target signal strength zone bit as 1, and determining the values of other signal strength zone bits except the target signal strength zone bit as 0;
And generating the sub-scanning distribution identification information based on the value of the scanning zone bit and the value of the signal strength zone bit.
5. The method of claim 1, wherein the co-occurrence information is in a string format, wherein the generating training sample data for each co-occurrence information based on the co-occurrence information and a tag corresponding to the co-occurrence information comprises:
Performing feature processing on the co-occurrence information of the character string format aiming at each piece of co-occurrence information to generate a feature vector corresponding to the co-occurrence information;
acquiring preset offline behavior information associated with the interest points corresponding to the co-occurrence information, and determining a true value corresponding to the tag based on the preset offline behavior information;
and generating the training sample data based on the feature vector and the true value corresponding to the label.
6. The method of claim 5, wherein determining the true value corresponding to the tag based on the preset offline behavior information comprises:
And determining the true value corresponding to the label as1 when the preset offline behavior occurs, or determining the true value corresponding to the label as 0 when the preset offline behavior does not occur.
7. The method according to claim 5, wherein the performing feature processing on the co-occurrence information in the character string format for each co-occurrence information to generate a feature vector corresponding to the co-occurrence information includes:
numbering each co-occurrence information of the character string format according to a preset numbering mode, and constructing a feature dictionary based on the numbers corresponding to the co-occurrence information respectively, wherein the feature dictionary is used for converting the co-occurrence information of the character string format into feature vectors;
And aiming at the co-occurrence information of each character string format, performing single-heat encoding processing on the co-occurrence information of the character string format based on the feature dictionary, and generating feature vectors corresponding to the co-occurrence information.
8. The method of claim 7, wherein the performing the single-hot encoding process on the co-occurrence information in the character string format based on the feature dictionary to generate the feature vector corresponding to the co-occurrence information comprises:
Determining the number of numbers contained in the feature dictionary, and creating a zero vector with a vector length of the number;
and modifying the value of a target index bit which is the same as the number in the zero vector into 1 according to the number corresponding to the co-occurrence information in the feature dictionary for each co-occurrence information, and generating a feature vector corresponding to the co-occurrence information.
9. The method according to claim 1, wherein training the model to be trained based on the respective training sample data to obtain a trained model comprises:
Inputting each training sample data into the model to be trained to obtain a prediction result corresponding to each training sample;
Based on the prediction result corresponding to each training sample and the label of each training sample data, adjusting model parameters of the model to be trained;
repeating the above process until the training result meets the preset requirement, and obtaining the trained model.
10. A method of locating a point of interest, comprising:
acquiring current WiFi scanning information of current equipment and associated WiFi information corresponding to each interest point corresponding to an area where the current equipment is located;
generating current co-occurrence information based on the current WiFi scanning information and associated WiFi information corresponding to the interest points aiming at each interest point;
And inputting each piece of current co-occurrence information into a trained model to obtain the interest point positioning information of the current equipment, wherein the trained model is obtained by the model training method according to any one of claims 1-9.
11. A model training device, comprising:
The information acquisition module is used for acquiring historical wireless fidelity WiFi scanning information and historical WiFi distribution information, wherein the historical WiFi scanning information comprises a plurality of groups of WiFi scanning information, each group of WiFi scanning information comprises at least one first WiFi information, and the historical WiFi distribution information comprises second WiFi information associated with each interest point;
The information generation module is used for generating co-occurrence information corresponding to each interest point based on the second WiFi information associated with the interest point and at least one first WiFi information in each group of WiFi scanning information, wherein the co-occurrence information is used for representing the scanning state of each first WiFi information in each group relative to the second WiFi information associated with the interest point;
The system comprises a sample generation module, a training sample data generation module, a detection module and a detection module, wherein the sample generation module is used for generating training sample data aiming at each co-occurrence information based on the co-occurrence information and a label corresponding to the co-occurrence information;
The model training module is used for training the model to be trained based on the data of each training sample to obtain a trained model, wherein the trained model is used for determining corresponding interest point positioning information based on the current WiFi scanning information;
The first WiFi information comprises a first WiFi and the signal intensity of the first WiFi, and the second WiFi information comprises a second WiFi and the signal intensity of the second WiFi.
12. A point of interest locating device, comprising:
the acquisition module is used for acquiring current WiFi scanning information of the current equipment and associated WiFi information corresponding to each interest point corresponding to the area where the current equipment is located;
The generating module is used for generating current co-occurrence information according to the current WiFi scanning information and the associated WiFi information corresponding to the interest points aiming at each interest point;
The positioning module is used for inputting each piece of current co-occurrence information into a trained model to obtain the interest point positioning information of the current equipment, wherein the trained model is obtained through the model training method according to any one of claims 1-9.
13. An electronic device comprising a processor, a memory and a bus, the memory storing machine-readable requests executable by the processor, the processor and the memory in communication over the bus when the electronic device is in operation, the machine-readable requests when executed by the processor performing the steps of the model training method of any one of claims 1 to 9 or the model training method of claim 10.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the model training method according to any one of claims 1 to 9 or the model training method according to claim 10.
CN202310678757.5A 2023-06-08 2023-06-08 Model training method and device, electronic equipment and storage medium Active CN116663677B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310678757.5A CN116663677B (en) 2023-06-08 2023-06-08 Model training method and device, electronic equipment and storage medium
PCT/CN2024/096051 WO2024251004A1 (en) 2023-06-08 2024-05-29 Model training method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310678757.5A CN116663677B (en) 2023-06-08 2023-06-08 Model training method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116663677A CN116663677A (en) 2023-08-29
CN116663677B true CN116663677B (en) 2025-03-14

Family

ID=87727636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310678757.5A Active CN116663677B (en) 2023-06-08 2023-06-08 Model training method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN116663677B (en)
WO (1) WO2024251004A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663677B (en) * 2023-06-08 2025-03-14 抖音视界有限公司 Model training method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2773302A1 (en) * 2011-04-05 2012-10-05 Her Majesty The Queen In Right Of Canada, As Represented By The Ministerof Industry, Through The Communications Research Centre Canada Cognitive wi-fi radio network
CN110782284A (en) * 2019-10-24 2020-02-11 腾讯科技(深圳)有限公司 Information pushing method and device and readable storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103200673B (en) * 2013-03-07 2016-03-23 浙江大学 Wireless terminal location method and positioner
CN107729459A (en) * 2017-09-30 2018-02-23 百度在线网络技术(北京)有限公司 Map interest point failure method for digging, device, equipment and computer-readable recording medium
CN110781256B (en) * 2019-08-30 2024-02-23 腾讯大地通途(北京)科技有限公司 Method and device for determining POI matched with Wi-Fi based on sending position data
CN111417066B (en) * 2020-03-05 2021-01-05 滴图(北京)科技有限公司 Positioning method, positioning device, electronic equipment and computer readable storage medium
CN111954175B (en) * 2020-08-25 2022-08-02 腾讯科技(深圳)有限公司 Method for judging visiting of interest point and related device
CN112399555B (en) * 2020-10-20 2023-07-14 北京嘀嘀无限科技发展有限公司 Position locating method and device, readable storage medium and electronic equipment
CN112804634B (en) * 2020-12-31 2023-04-07 北京嘀嘀无限科技发展有限公司 Wi-Fi signal processing method, device, equipment and storage medium
CN114861017A (en) * 2022-04-02 2022-08-05 深圳依时货拉拉科技有限公司 Point-of-interest recommendation method, apparatus, computer device and storage medium
CN116193362A (en) * 2023-03-02 2023-05-30 北京抖音智图科技有限公司 Method, device, device and storage medium for maintaining point of interest information
CN116663677B (en) * 2023-06-08 2025-03-14 抖音视界有限公司 Model training method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2773302A1 (en) * 2011-04-05 2012-10-05 Her Majesty The Queen In Right Of Canada, As Represented By The Ministerof Industry, Through The Communications Research Centre Canada Cognitive wi-fi radio network
CN110782284A (en) * 2019-10-24 2020-02-11 腾讯科技(深圳)有限公司 Information pushing method and device and readable storage medium

Also Published As

Publication number Publication date
WO2024251004A1 (en) 2024-12-12
CN116663677A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
JP6594988B2 (en) Method and apparatus for processing address text
CN111783875A (en) Abnormal user detection method, device, equipment and medium based on cluster analysis
CN110750658B (en) Recommendation method of media resource, server and computer readable storage medium
CN110019865B (en) Mass image processing method and device, electronic equipment and storage medium
CN109446171B (en) Data processing method and device
CN113592593B (en) Training and application method, device, equipment and storage medium of sequence recommendation model
CN115208414B (en) Data compression method, data compression device, computer device and storage medium
CN115795000A (en) Joint similarity algorithm comparison-based enclosure identification method and device
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN116663677B (en) Model training method and device, electronic equipment and storage medium
CN113762313A (en) Request identification method and device, electronic equipment and storage medium
CN113591881A (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN112861519A (en) Medical text error correction method, device and storage medium
CN110766166A (en) Push model optimization method and device executed by user terminal
CN114780701A (en) Automatic question-answer matching method, device, computer equipment and storage medium
CN111325614B (en) Recommendation method and device of electronic object and electronic equipment
CN111340574B (en) Risk user identification method and device and electronic equipment
CN113505192A (en) Data tag library construction method and device, electronic equipment and computer storage medium
CN115563377B (en) Enterprise determination method and device, storage medium and electronic equipment
CN111339432A (en) Recommendation method and device of electronic object and electronic equipment
CN111160969A (en) Power price prediction method and device
CN110378512A (en) Predict method and device, the computer equipment, storage medium of stock
CN109190039B (en) Method and device for determining similar objects and computer readable storage medium
CN111814012A (en) Activity matching method, device, medium and electronic equipment
CN116911304B (en) Text recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant