Disclosure of Invention
Embodiments of the present application provide a data processing method, an apparatus, a device, and a storage medium, so as to solve the problems of low data processing efficiency and high cost in the existing scheme for mining association relationships or mapping relationships between different IDs.
According to a first aspect of the present application, an embodiment of the present application provides a data processing method, including:
acquiring an identification file to be processed, wherein the identification file comprises: at least one identity to be matched;
comparing the at least one identity to be matched with the identity in the known mapping relation information to obtain a matching result, wherein the matching result comprises: matching at least one first identity of the account identification codes with the account identification code corresponding to each first identity;
and outputting the account identification code corresponding to each first identity.
In one possible design of the first aspect, the method further includes:
analyzing the identification file to obtain the at least one identity to be matched;
and adding each identity to be matched to the known first identity relation information to obtain second identity relation information.
Optionally, the matching result further includes: at least one second identity which is not matched with the account identification code;
correspondingly, the method further comprises the following steps:
for each second identity, judging whether a target identity having an association relation with the second identity exists in the second identity relation information;
if so, determining a target account identification code corresponding to the target identity according to the target identity and the known mapping relation information;
and determining the target account identification code as the account identification code corresponding to the second identity.
Optionally, the method further includes:
and updating the known mapping relation information according to the mapping relation between the second identity and the target account identification code.
In another possible design of the first aspect, before the obtaining the identification file to be processed, the method further includes:
acquiring a historical data set;
processing the data in the historical data set to obtain a plurality of identity identifications and mapping relation data of the identity identifications and account identification codes included in the historical data set;
based on whether the association relationship exists among the identity identifications, classifying and storing the identity identifications in the historical data set to obtain first identity identification relationship information;
and generating the known mapping relation information based on the mapping relation data of the identity and the account identification code and the coding type of the identity.
Optionally, the first identity relationship information is represented by an identity relationship table, where the identities in the same row in the identity relationship table are identities having an association relationship and belonging to different encoding types, where the association relationship includes at least one of the following relationships: belonging to the same equipment and the same user.
In yet another possible design of the first aspect, the method further includes:
and pushing information to equipment corresponding to each first identity mark based on the account identification code corresponding to the first identity mark.
According to a second aspect of the present application, an embodiment of the present application provides a data processing apparatus, including:
an obtaining module, configured to obtain an identifier file to be processed, where the identifier file includes: at least one identity to be matched;
the processing module is configured to compare the at least one identity to be matched with an identity in the known mapping relationship information to obtain a matching result, where the matching result includes: matching at least one first identity of the account identification codes with the account identification code corresponding to each first identity;
and the output module is used for outputting the account identification code corresponding to each first identity.
In a possible design of the second aspect, the processing module is further configured to:
analyzing the identification file to obtain the at least one identity to be matched;
and adding each identity to be matched to the known first identity relation information to obtain second identity relation information.
Optionally, the matching result further includes: at least one second identity which is not matched with the account identification code;
correspondingly, the processing module is further configured to:
for each second identity, judging whether a target identity having an association relation with the second identity exists in the second identity relation information;
if so, determining a target account identification code corresponding to the target identity according to the target identity and the known mapping relation information;
and determining the target account identification code as the account identification code corresponding to the second identity.
Optionally, the processing module is further configured to update the known mapping relationship information according to the mapping relationship between the second identity and the target account identification code.
In another possible design of the second aspect, the obtaining module is further configured to obtain a historical data set before obtaining the identification file to be processed;
the processing module is further configured to:
processing the data in the historical data set to obtain a plurality of identity identifications and mapping relation data of the identity identifications and account identification codes included in the historical data set;
based on whether the association relationship exists among the identity identifications, classifying and storing the identity identifications in the historical data set to obtain first identity identification relationship information;
and generating the known mapping relation information based on the mapping relation data of the identity and the account identification code and the coding type of the identity.
In yet another possible design of the second aspect, the output module is further configured to push information to a device corresponding to each first identity identifier based on an account identification code corresponding to the first identity identifier.
According to a third aspect of the present application, embodiments of the present application provide a data processing apparatus comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the method as described in the first aspect and possible designs when executing the computer program.
According to a fourth aspect of the present application, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions for implementing the method as set forth in the first aspect and possible designs as described above when executed by a processor.
According to a fifth aspect of the present application, an embodiment of the present application provides a computer program product, including: a computer program stored in a readable storage medium from which at least one processor of a data processing apparatus can read the computer program, execution of the computer program by the at least one processor causing the data processing apparatus to perform the method of the first aspect.
According to the data processing method, the data processing device, the data processing equipment and the storage medium, after the identification file to be processed is obtained, at least one identity to be matched included in the identification file is compared with the identity in the known mapping relation information to obtain a matching result, wherein the matching result comprises: and matching at least one first identity of the account identification codes and the account identification code corresponding to each first identity, and finally outputting the account identification code corresponding to each first identity. According to the technical scheme, automatic matching of the Identification (ID) and the account number identification code (PIN) is achieved, manual participation is not needed, the cost is reduced, and the data processing efficiency is improved.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
At present, many e-commerce websites actively accumulate and store IDs as data assets, and establish a mapping relationship between a user service account number PIN and other two types of IDs according to a traffic behavior log (hereinafter, differentiation is performed, the first type of user service account number is directly called an account number PIN, the IDs mainly refer to the other two types, and actually, the three types all belong to an ID category).
The ID data and the mapping relation data are beneficial to an operation team of the e-commerce enterprise to reach people through the account PIN or equipment on one hand, and are also beneficial to an algorithm team of the e-commerce enterprise to further analyze and mine, establish an association relation between users and the like, such as a human-human relation map and the identity recognition on the other hand. Therefore, the accumulated ID types and numbers and the mapping relationship between the IDs and the account numbers and PINs are very important, which is a foundation for further establishing the user relationship identification algorithm.
However, since the types of IDs are particularly numerous, because of technical or authority issues, different IDs are stored in different teams or departments in a scattered manner, or the same type of IDs may be distributed among data platforms of different departments, for example, one part is stored in a data platform of department a and another part is stored in a data platform of department B. At present, the method for mining the mapping or association relationship between different IDs is that an operator actively knows the source of an ID, determines the authority affiliation of different ID sources, and further determines the mapping or association relationship between different IDs through an interactive manner, but the method has the disadvantages of passive work, high communication cost and low data processing efficiency.
Aiming at the technical problems, the technical idea process of the application is as follows: based on the manual collection and precipitation work of the existing ID data assets, at the initial scale of the existing ID data assets, a bidirectional mapping relation is established between part of IDs and account numbers PIN at present, and known mapping relation information is obtained, so that after business personnel obtain the identity (such as equipment IDs) of users from different channels, the account numbers PIN corresponding to the identity can be determined according to the known mapping relation information, namely, the ID collection work is changed from passive to active by commercializing the mapping requirement function of the ID data, after the bidirectional mapping relation between part of IDs and PINs is established based on the existing ID data set, the PINs corresponding to some IDs can be determined for the IDs obtained from different sources, and meanwhile, the mapping quantity of the IDs and the PINs can be improved by utilizing the relation between the IDs.
Based on the technical concept, an embodiment of the present application provides a data processing method, where after an identification file to be processed is obtained, at least one to-be-matched identity included in the identification file is compared with an identity in known mapping relationship information to obtain a matching result, where the matching result includes: and matching at least one first identity of the account identification codes and the account identification code corresponding to each first identity, and finally outputting the account identification code corresponding to each first identity. According to the technical scheme, automatic matching of the Identification (ID) and the account number identification code (PIN) is achieved, manual participation is not needed, the cost is reduced, and the data processing efficiency is improved.
Exemplarily, fig. 1 is a schematic view of an application scenario of a data processing method provided in an embodiment of the present application. As shown in fig. 1, the application scenario may include: at least one terminal device (fig. 1 shows three terminal devices, respectively, terminal device 111, terminal device 112, terminal device 113), a network 12, a service server 13 and a data processing device 14. Each terminal device and the service server 13 may communicate with each other through the network 12, so that the service server 13 may obtain the user operation behavior data, and thereby determine the user identity information carried in the user operation behavior data.
Optionally, the service server 13 may further generate an identification file to be processed according to the user identity identification information carried in the user operation behavior data.
For example, in the application scenario shown in fig. 1, the data processing device 14 may directly perform information interaction with the service server 13, and obtain the identification file to be processed from the service server 13; or, the data processing device 14 receives the identification file uploaded by the operator through the operation terminal 16, performs matching processing on each to-be-matched identity in the identification file, and feeds back a matching result to the operation terminal 16 for display. The embodiment of the present application does not limit the specific way in which the data processing device obtains the markup file to be processed, and the specific way can be determined according to an actual scene, which is not described herein again.
In this embodiment, the data processing device 14 may execute the program code of the data processing method provided in the present application based on the acquired to-be-processed identification file, so as to obtain a matching result of each to-be-matched identification.
Optionally, the application scenario shown in fig. 1 may further include a data storage device 15, where the data storage device 15 may be connected to the service server 13, or may be connected to the data processing device 14, and is configured to store data output by the service server 13 and/or data output by the data processing device 14.
It should be noted that fig. 1 is only a schematic diagram of an application scenario provided by an embodiment of the present application, and the embodiment of the present application does not limit the devices included in fig. 1, nor limit the positional relationship between the devices in fig. 1, for example, in fig. 1, the data storage device 15 may be an external memory with respect to the service server 13, in other cases, the data storage device 15 may be disposed in the service server 13, the data processing device 14 may be a device that exists separately from the service server 13, or may be a component that is integrated into the service server 13, and the embodiment of the present application is not limited thereto.
In practical applications, both the terminal device and the server are processing devices with data processing capabilities, so that the data processing device in the application scenario shown in fig. 1 can be implemented by the terminal device or the server.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a schematic flowchart of a first embodiment of a data processing method provided in the present application. As shown in fig. 2, the data processing method may include the steps of:
s201, obtaining an identification file to be processed, wherein the identification file comprises: at least one identity to be matched.
In the embodiment of the application, if an operator wants to determine account PINs of users using certain platforms, the data processing method provided by the embodiment of the application can be executed by data processing equipment to obtain the account PINs. Optionally, the data processing device needs to first obtain an identification file to be processed, and the identification file at least carries at least one identity to be matched, so that the data processing device can perform query and matching of the mapping relationship in a targeted manner.
As an example, the data processing device is a terminal device that can be operated by an operator, the to-be-matched identifier in the to-be-processed identifier file may be an identifier that is subjected to screening processing by the operator, and the processing range of the data processing device can be narrowed through the screening processing by the operator, so that the to-be-matched identifier is output in a targeted manner.
Illustratively, fig. 3 is a schematic diagram of an interface for uploading a markup file to be processed. For example, as shown in fig. 3, in this embodiment, the data type of the selectable identity to be matched in the identification file is a device ID, and the uploaded ID type may be an Android (Android) mobile device identification code (imei) or an open identity for wechat (openID): wx _ openID); selecting the mapping rule as being most active (optionally, in other scenarios, selecting the mapping rule may be the most active, and so on); selecting a file for uploading, wherein the identification file can support a txt format, the maximum file is 15M, and the file coding must be in a UTF-8-BOM format, and in practical application, the specific uploading process comprises the following steps: the txt is opened using the NotePad code editor (NotePad + +), and the code can be viewed and selected by clicking the menu bar [ code ], and selected [ UTF-8-BOM code ] - [ save ]. Illustratively, the data processing device may perform operations such as crowd estimation or convertible PINs based on the identification file shown in fig. 3. When the data processing device determines that the operations all pass verification, the data processing device may select [ create ], so as to create an identification file.
It is to be understood that the interface diagram for uploading the markup file shown in fig. 3 is an exemplary illustration, and the interface diagram is not limited in this embodiment, and may be adjusted according to an actual scenario.
As another example, the data processing device may directly perform information interaction with the service server, and in this case, the data processing device may directly obtain the identification file to be processed from the service server, or directly obtain the user operation data from the service server, and then process the user operation data based on a preset rule to determine the identification file to be processed. The embodiment of the present application does not limit the specific manner for obtaining the markup file to be processed, and the specific manner can be determined according to actual requirements, which is not described herein again.
It is understood that, in practical applications, the identification file is an identity identification file after being subjected to a screening process. Illustratively, the type of the identity to be matched in the identification file is a screened ID type, wherein the ID type is selectable in many ways, and the purpose of screening the ID type is to narrow the query range in the subsequent matching.
Optionally, the format of the identification file has a certain requirement, that is, in a certain identification file, the related IDs are placed in the same row. For example, taking an IMEI and an Application Identifier (AID) as an example for description, where the IMEI and the AID are two different types of IDs acquired by a mobile phone device, the IMEI belongs to an ID of a mobile phone hardware type, and the AID is an ID of a mobile phone operating system, when a user browses an article with a mobile phone, a browsing log stores the IMEI number and the AID number of the mobile phone in the same row, which represents that the two IDs belong to the same mobile phone device, and if the user acquires the two types of IDs from a browsing log table, when generating an identifier file, the ID originally stored in the same row should still be pasted to the same row of the identifier file according to a storage manner of a source table to be stored in the identifier file, and at the same time, an ID type code is marked.
Exemplarily, fig. 4 is a schematic diagram of a storage format of an identity to be matched in an identifier file provided in the embodiment of the present application. As shown in fig. 4, it is assumed that an identification file in txt format needs to store two types of IDs, such as IMEI and AID, where the identity corresponding to the IMEI includes: b001, B021, the ID that AID corresponds includes: c001, C100, and B001 and C001 are two types of IDs of the same terminal device; as can be seen by querying a document preset with an association relationship between an identification type and an encoding type (type), where the type of IMEI is encoded to be 3 and the type of AID is encoded to be 4, then in the identification file in txt format, IDs with the association relationship are stored in the same row, as shown in fig. 4, the information in the first row includes: b001,3, C001, 4; the information in the second line includes: b021, 3; the information in the third row includes: c100, 4; wherein, each column of each row represents ID1, ID1_ type, ID2, ID2_ type, and so on.
S202, comparing the at least one identity to be matched with the identity in the known mapping relation information to obtain a matching result, wherein the matching result comprises: and matching at least one first identity of the account identification codes with the account identification code corresponding to each first identity.
In the embodiment of the application, the data processing device stores known mapping relationship information representing the identity and the account identification code, so that after the data processing device analyzes the obtained identification file to obtain the at least one to-be-matched identity, each to-be-matched identity can be compared with the identity in the known mapping relationship information respectively to judge whether the at least one to-be-matched identity exists in the known mapping relationship information, if so, the existing to-be-matched identity is determined to be a first identity matched with the account identification code, and correspondingly, the at least one first identity matched with the account identification code and the account identification code corresponding to each first identity are obtained.
For example, in the embodiment of the present application, it is assumed that the identity to be matched in the identification file includes: and B001, B021, C001 and C100, storing the identifiers to be matched into a column to obtain an identifier table to be matched shown in the table 1 so as to be convenient for matching with a mapping relation table corresponding to the known mapping relation information. Optionally, in this embodiment, the table 2 is a mapping table corresponding to the known mapping information.
The first column and the second column of table 1 respectively represent the id to be matched and the type (type) of the corresponding id to be matched. Table 2 is used to characterize the association relationship of an Identity (ID), an account identification number (PIN), and a code type (type). The table 2 is obtained based on a currently known mapping relationship between an Identification (ID) and an account PIN.
Table 1 table of identification marks to be matched
B001
|
3
|
C001
|
4
|
B021
|
3
|
C100
|
4 |
TABLE 2 mapping relationship Table
ID
|
PIN
|
TYPE
|
A001
|
PIN1
|
1
|
B034
|
PIN1
|
3
|
C123
|
PIN1
|
4
|
C001
|
PIN2
|
4
|
B077
|
PIN4
|
3 |
Optionally, in this embodiment, by comparing data in table 1 and table 2, it can be known that C001 in table 1 exists in table 2, that is, C001 is at least one first identity identifier matched to an account identification code, and an account identification code (PIN) corresponding to C001 is PIN 2.
And S203, outputting the account identification code corresponding to each first identity.
In the embodiment of the application, after the data processing device determines the account identification code corresponding to the first identity, the data processing device can output the account identification code to be displayed to an operator. As an example, if the data processing device has a human-machine interaction interface, it may present an account identification code corresponding to each first identity through the human-machine interaction interface. As another example, the data processing device may also be connected to a test terminal of an operator, so that the data processing device may send an account identification code corresponding to each first identity or a set of all account identification codes corresponding to the first identities to the test terminal, so that the test terminal displays the account identification code corresponding to each first identity, and the operator can obtain the account identification code corresponding to the identity in time.
In the data processing method provided by the embodiment of the application, after the identification file to be processed is obtained, at least one to-be-matched identity included in the identification file is compared with the identity in the known mapping relationship information to obtain a matching result, where the matching result includes: and matching at least one first identity of the account identification codes and the account identification code corresponding to each first identity, and finally outputting the account identification code corresponding to each first identity. According to the technical scheme, automatic matching of the Identification (ID) and the account number identification code (PIN) is achieved, manual participation is not needed, the cost is reduced, and the data processing efficiency is improved.
Optionally, on the basis of the foregoing embodiment, fig. 5 is a schematic flow chart of a second embodiment of the data processing method provided in the embodiment of the present application. As shown in fig. 5, the data processing method may further include the steps of:
s501, analyzing the identification file to obtain at least one identity to be matched.
In the embodiment of the application, in order to facilitate mapping matching between the identity and the account PIN, after the data processing device obtains the identification file, the data processing device may first analyze the identification file to obtain at least one identity to be matched, which is included in the identification file.
For the identification file shown in fig. 4, the identification to be matched included in the identification file includes: b001, C001, B021 and C100. Optionally, for the obtained identification file, the data processing device may store each to-be-matched identification obtained by analysis into a hive table, for example, table 1 described above, based on the data warehouse tool.
S502, adding each identity to be matched to the known first identity relation information to obtain second identity relation information.
In one possible design, it is assumed that first known identity relationship information is stored in the data processing device, the first identity relationship information being obtained based on an association relationship between known IDs. Illustratively, table 3 is an ID relationship table corresponding to the first ID relationship information, and the table 3 is generated based on the currently existing partial ID data.
Table 3 id relation table corresponding to the first id relation information
1
|
2
|
3
|
4
|
5
|
A001
|
--
|
B034
|
C123
|
--
|
--
|
--
|
--
|
C001
|
--
|
--
|
--
|
B077
|
--
|
-- |
In table 3, in order to distinguish the encoding types of IDs, 1 to 5 represent the numbers of ID types, respectively, and IDs stored in the same row have some association relationship, for example, belong to the same device. It is understood that the id of this table 3 does not contain an account number PIN. In Table 3, - -, each represents a null value.
Specifically, as can be seen from table 3, the first identity relationship information may be represented by an identity relationship table, where the identities located in the same row in the identity relationship table are identities having an association relationship and belonging to different encoding types, where the association relationship includes at least one of the following relationships: belonging to the same equipment and the same user. For example, a001, B034, and C123 in the first row in table 3 may be two different types of IDs for the same device.
In this step, the data processing device may add the waiting-to-match identifiers B001, C001, B021, and C100 included in the identifier file to the identifier relationship table corresponding to the first identifier relationship information shown in table 3 to obtain an identifier relationship table corresponding to the second identifier relationship information shown in table 4, where the thickened identifiers in table 4 are new identifiers. In Table 4, - -, indicates a null value. Similarly, each id in each row in table 4 is also an id having an association relationship and belonging to a different encoding type.
Table 4 identification relation table corresponding to the second identification relation information
1
|
2
|
3
|
4
|
5
|
A001
|
--
|
B034
|
C123
|
--
|
--
|
--
|
B001
|
C001
|
--
|
--
|
--
|
B077
|
--
|
--
|
--
|
--
|
B021
|
--
|
--
|
--
|
--
|
--
|
C100
|
-- |
In this embodiment, the identity to be matched is added to the identity relationship table corresponding to the known first identity relationship information, so that convenience is provided for subsequent matching.
In an embodiment of the present application, as an example, the matching result further includes: at least one second identity not matched to the account identification code.
Specifically, according to the step S202 in the embodiment shown in fig. 2, if the first identity matching the account id in B001, B021, C001, and C100 is C001, the second identity not matching the account id includes: b001, B021, C100, at this time, 3 IDs (B001, B021, C100) that are not matched to the PIN may be stored in table 5, the table 5 indicating the second ID table that is not matched to the account ID. Similarly, - - -, in Table 5 denotes a null value.
TABLE 5 second IDENTIFICATION TABLE UNMATCHED TO ACCOUNT IDENTIFICATION CODE
Correspondingly, in this embodiment, as shown in fig. 5, the data processing method may further include the following steps:
s503, judging whether a target identity related to the second identity exists in the second identity relation information or not for each second identity; if yes, executing S504, otherwise, ending.
For example, since the ids in each row in the id relationship table corresponding to the second id relationship information are ids having an association relationship and belonging to different encoding types, by comparing the second id in table 5 with the ids in table 4, it is determined whether a target id having an association relationship with the second id exists in table 4 for each second id.
Optionally, as a result of comparing table 5 with table 4, the second identifiers B001 and C001 are located in the same row, and have an association relationship with each other, and at this time, C001 is the target identifier having an association relationship with B001.
S504, determining a target account identification code corresponding to the target identity according to the target identity and the known mapping relation information.
Optionally, the data processing device may find that C001 can be matched to PIN2 by querying the lookup table 2, that is, by querying the mapping relationship table corresponding to the known mapping relationship information, that is, the account identification code corresponding to C001 is PIN 2.
And S505, determining the target account identification code as an account identification code corresponding to the second identity.
In this step, as can be seen from S503 and S504, the target identifier associated with the second identifier B001 is C001, and the account identifier corresponding to C001 is PIN2, so that it can be determined that the account identifier corresponding to the second identifier B001 is also PIN 2.
Further, in an embodiment of the present application, the data processing method may further include the following steps:
s506, updating the known mapping relation information according to the mapping relation between the second identity and the target account identification code.
For example, after the target account identification code corresponding to the second identity is determined, the mapping relationship between the second identity and the target account identification code may be newly added to the mapping relationship table corresponding to the known mapping relationship information.
For example, the mapping relationship between B001 and PIN2 is added to table 2, and a mapping relationship table (ID > PIN mapping relationship table) corresponding to the updated known mapping relationship information is obtained. Optionally, the table 6 is a mapping table corresponding to the updated known mapping information.
Table 6 updated mapping relation table corresponding to known mapping relation information
ID
|
PIN
|
TYPE
|
A001
|
PIN1
|
1
|
B034
|
PIN1
|
3
|
C123
|
PIN1
|
4
|
C001
|
PIN2
|
4
|
B077
|
PIN4
|
3
|
B001
|
PIN2
|
3
|
B021
|
--
|
3
|
C100
|
--
|
4 |
Referring to table 6 and table 2, table 6 adds a new mapping relationship between B001 and PIN2 compared to table 2, and the obtained table 6 can continuously provide automatic matching service subsequently, and similarly, the value — in table 6 indicates a null value.
Further, referring to table 6, the ID to be matched is also added to table 6, so that the number of ID data assets can be increased, and further the number of IDs matched to the PIN can be increased through the internal model in the subsequent process.
According to the data processing method provided by the embodiment of the application, each identity to be matched in the identification file is added to known first identity relationship information to obtain second identity relationship information, whether a target identity having a relationship with the second identity exists in the second identity relationship information is judged for each second identity, if yes, a target account identification code corresponding to the target identity is determined according to the target identity and the known mapping relationship information, so that the account identification code corresponding to the second identity is determined, and the known mapping relationship information is updated according to the mapping relationship between the second identity and the target account identification code. According to the technical scheme, a virtuous cycle of automatic data updating iteration can be realized according to the incidence relation between the identity marks and the mapping relation between the identity marks and the account number identification codes, the number of automatic matching between the identity marks and the account number identification codes and the data processing efficiency are improved, and background data assets are optimized.
Exemplarily, in an embodiment of the present application, fig. 6 is a schematic flowchart of a third embodiment of a data processing method provided in the embodiment of the present application. As shown in fig. 6, before the above S201, the data processing method may further include the steps of:
s601, acquiring a historical data set.
In an embodiment of the application, a certain number of historical data sets are present in the data platform based on the data accumulation and manual processing results of the data platform. Optionally, the historical data set may include part of the ID data assets and the one-way mapping relationship data of the ID and the account identification number (ID > pin).
When the data processing platform executes the technical scheme of the application and needs to determine the mapping relation between the identity and the account identification code, the data processing platform can firstly acquire a historical data set, and determine the known mapping relation information and the first identity identification relation information by analyzing the historical data set, so that a foundation is laid for subsequent automatic matching.
S602, processing the data in the historical data set to obtain a plurality of identity identifications and mapping relation data of the identity identifications and the account identification codes included in the historical data set.
Optionally, in this embodiment of the present application, the relationship between the data in the historical data set may be ambiguous, and the data processing device may distinguish the ID data asset and the ID from the mapping relationship data of the account identification code (ID > pin) for subsequent direct use.
S603, based on whether the association relationship exists among the identity identifications, classifying and storing the identity identifications in the historical data set to obtain first identity identification relationship information.
In this step, in order to distinguish the types of the identifiers, the types of the identifiers may be first numbered, then the identifiers in the historical data set are stored according to the types of the identifiers, and the identifiers having an association relationship are stored according to one association relationship, for example, the identifiers belonging to the same device are stored in the same row, and the identifiers belonging to different devices are stored in different rows, so as to obtain the known first identifier relationship information.
For example, assume that the identities in the historical data set include: a001, B034, C123, C001, and B077, where a001, B034, and C123 are used to represent different types of ids of the same device, so C001 and B077 are ids of different devices, and thus, classifying and storing a001, B034, C123, C001, and B077 may obtain an id relationship table corresponding to the first id relationship information as shown in table 3.
S604, generating known mapping relation information based on the mapping relation data of the identity and the account identification code and the coding type of the identity.
In this embodiment, in order to implement automatic matching between an identity identifier and an account identifier, after obtaining mapping relationship data between the identity identifier and the account identifier, the data processing device first establishes a mapping relationship table, where a first column of the mapping relationship table represents the identity identifier, a second column represents the account identifier, and a third column represents a coding type of the identity identifier. Therefore, based on the mapping relationship data between the identity and the account identification code and the encoding type of the identity, a mapping relationship table corresponding to the known mapping relationship information as shown in table 2 can be generated.
According to the data processing method provided by the embodiment of the application, the obtained data in the historical data set is processed to obtain a plurality of identity identifications and mapping relation data of the identity identifications and the account identification codes, the identity identifications in the historical data set are classified and stored based on whether incidence relations exist among the identity identifications to obtain first identity identification relation information, and finally the known mapping relation information is generated based on the mapping relation data of the identity identifications and the account identification codes and the coding types of the identity identifications. According to the technical scheme, the first identity identification relation information and the known mapping relation information are obtained according to the historical data set, so that implementation conditions are provided for subsequent automatic matching processing, the working efficiency is improved, and meanwhile, background data assets are optimized.
Further, on the basis of the above embodiment, the data processing method provided by the present application may further include the following steps:
and pushing information to the equipment where the first identity identification is located based on the account identification code corresponding to each first identity identification.
Optionally, after determining account identification codes corresponding to different identifiers through the embodiments of fig. 2 to fig. 6 for identifiers (e.g., device IDs) acquired by service personnel from different channels, the data processing device may push messages through the account identification codes, for example, push information to a device where the first identifier is located through the account identification code corresponding to each first identifier, so that the user can reach a specific user by touching the pushed information, and a precondition is provided for subsequently improving service efficiency.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 7 is a schematic structural diagram of an embodiment of a data processing apparatus provided in the present application. Referring to fig. 7, the data processing apparatus may include:
an obtaining module 701, configured to obtain an identifier file to be processed, where the identifier file includes: at least one identity to be matched;
a processing module 702, configured to compare the at least one identity to be matched with an identity in the known mapping relationship information to obtain a matching result, where the matching result includes: matching at least one first identity of the account identification codes with the account identification code corresponding to each first identity;
the output module 703 is configured to output an account identification code corresponding to each first identity.
In a possible design of this embodiment of the present application, the processing module 702 is further configured to:
analyzing the identification file to obtain the at least one identity to be matched;
and adding each identity to be matched to the known first identity relation information to obtain second identity relation information.
Optionally, the matching result further includes: at least one second identity which is not matched with the account identification code;
correspondingly, the processing module 702 is further configured to:
for each second identity, judging whether a target identity having an association relation with the second identity exists in the second identity relation information;
if so, determining a target account identification code corresponding to the target identity according to the target identity and the known mapping relation information;
and determining the target account identification code as the account identification code corresponding to the second identity.
Optionally, the processing module 702 is further configured to update the known mapping relationship information according to the mapping relationship between the second identity and the target account identification code.
In another possible design of the embodiment of the present application, the obtaining module 701 is further configured to obtain a historical data set before obtaining the identification file to be processed;
the processing module 702 is further configured to:
processing the data in the historical data set to obtain a plurality of identity identifications and mapping relation data of the identity identifications and account identification codes included in the historical data set;
based on whether the association relationship exists among the identity identifications, classifying and storing the identity identifications in the historical data set to obtain first identity identification relationship information;
and generating the known mapping relation information based on the mapping relation data of the identity and the account identification code and the coding type of the identity.
In another possible design of the embodiment of the application, the output module 703 is further configured to push information to a device corresponding to each first identity identifier based on an account identification code corresponding to the first identity identifier.
The apparatus provided in the embodiment of the present application may be used to implement the technical solution described in the embodiment of the method, and the implementation principle and the technical effect are similar, which are not described herein again.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the processing module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a function of the processing module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Fig. 8 is a schematic structural diagram of an embodiment of a data processing apparatus provided in the present application. As shown in fig. 8, the data processing apparatus may include: the system comprises a processor 801, a memory 802, a communication interface 803 and a system bus 804, wherein the memory 802 and the communication interface 803 are connected with the processor 801 through the system bus 804 and complete mutual communication, the memory 802 is used for storing computer programs, the communication interface 803 is used for communicating with other devices, and the technical scheme of the method embodiment is realized when the processor 801 executes the computer programs.
In fig. 8, the processor 801 may be a general-purpose processor, including a central processing unit CPU, a Network Processor (NP), and the like; but also a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
The memory 802 may include a Random Access Memory (RAM), a read-only memory (RAM), and a non-volatile memory (non-volatile memory), such as at least one disk memory.
The communication interface 803 is used to enable communication between the database access device and other devices (e.g., clients, read-write libraries, and read-only libraries).
The system bus 804 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
Optionally, an embodiment of the present application further provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when the computer executing instruction runs on a computer, the computer is enabled to execute the technical solution described in the foregoing method embodiment.
Optionally, an embodiment of the present application further provides a chip for executing the instruction, where the chip is configured to execute the technical solution described in the foregoing method embodiment.
There is also provided, in accordance with an embodiment of the present application, a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of the data processing device can read the computer program, the execution of the computer program by the at least one processor causing the data processing device to carry out the solution provided by any of the embodiments described above.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.