Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
It will be appreciated that before using the technical solutions of the various embodiments in the disclosure, the user may be informed of the type of personal information involved, the range of use, the use scenario, etc. in an appropriate manner, and obtain the authorization of the user.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Therefore, the user can select whether to provide personal information to the software or hardware such as the electronic equipment, the application program, the server or the storage medium for executing the operation of the technical scheme according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative, and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
Fig. 1 shows a schematic diagram of an exemplary system 100 provided by an embodiment of the present disclosure.
As shown in fig. 1, the system 100 may be used to perform functions such as form creation and maintenance, and may include terminal devices 102A, 102B, a server 106, and a database server 108. The terminal device 102A may include a medium (e.g., a network) that provides a communication link with the server 106 and database server 108. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
Various Applications (APP) or software, such as a collaborative office-type application or software, an image processing-type application or software, a video conference-type application or software, a reading-type application or software, a video-type application or software, a social-type application or software, a payment-type application or software, a web browser, an instant messaging tool, etc., may be installed on the terminal devices 102A, 102B. In some embodiments, these applications or software may be used for table creation and maintenance, etc.
The terminal devices 102A, 102B may be hardware or software. When the terminal devices 102A, 102B are hardware, they may be various electronic or computer devices with display screens, including but not limited to smartphones, tablets, electronic book readers, MP3 players, laptop portable computers (Laptop), desktop computers (PC), and the like. When the terminal devices 102A, 102B are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.
The server 106 may be a server that provides various services, such as a background server that provides support for various applications or software displayed on the terminal devices 102A, 102B. Database server 108 may also be a database server that provides various services. It will be appreciated that the database server 108 may not be provided in the system 100 where the server 106 may implement the relevant functions of the database server 108.
The server 106 and database server 108 may be hardware or software. When they are hardware, they may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.
It should be noted that, the table data processing method provided by the embodiment of the present disclosure may be executed by the terminal devices 102A and 102B or interactively executed by each device in the system 100. It should be understood that the number of terminal devices, users, servers and database servers in fig. 1 is merely illustrative. There may be any number of terminal devices, users, servers, and database servers, as desired for implementation.
In some exemplary scenarios, the user 104A or the user 104B may create and maintain the table through collaborative office-like software or applications installed in the terminal device 102A or the terminal device 102B, respectively. Alternatively, the user 104A and the user 104B may co-maintain the same table, e.g., view, add, modify, delete rows of data in the table. Alternatively, the table may be a table for multidimensional management of one item, and may include a plurality of data tables in a plurality of dimensions, and thus the table may further include a plurality of data tables in a plurality of dimensions, each of which may be used for storing data of its corresponding dimension in correspondence with other data tables.
As an alternative embodiment, user 104A and/or user 104B may also associate a data row of the B data table for a field of a data row of the A data table in the table, such that user 104A and/or user 104B may view information in the A data table for the data row in the B data table associated with the field.
In some embodiments, the association may include a one-way association and a two-way association.
Unidirectional association may refer to certain fields in the table supporting a unidirectional association function, i.e., associating to a data row of the B data table in the a data table. Thus, after the association is completed, the entire row of information for the row of data can be clicked and viewed in the field of the A data table, and further jump to the B data table associated with the field for viewing. In some embodiments, the unidirectionally associated field may also be associated with a data row in the current data table (i.e., the A data table).
Bi-directional association may refer to certain fields in a table supporting bi-directional association functionality. Using the bi-directional association field, one or more rows of the B-table may be associated with the a-table, and the associated row of the B-table may then be automatically associated back with the corresponding row of the a-table. The user can directly check the data of the B data table in the A data table, jump to the B data table further, and jump back to the A data table from the B data table by one key.
It follows that when the user 104A and/or the user 104B performs an association operation in a current data table (e.g., a data table), it is necessary to find a data row to be associated in another data table (e.g., B data table) first, and then associate the two through the association operation. However, the inventors of the present disclosure have found that when the number of data rows of another data table (e.g., a B data table) is large, a user may need to constantly look at the data rows of the data table to find the data rows to be associated, which is not easy enough to operate.
As previously described, the user 104A and/or the user 104B may utilize collaborative office-like software or applications installed in the terminal devices 102A, 102B to create and maintain tables.
Illustratively, in an initial state, the user 104A may open the collaborative office class software or application using the terminal device 102A and further open the form to be maintained or create a new form.
In some embodiments, the table may include multiple data tables in multiple dimensions. For example, the table may be an order and merchandise statistics table, and thus may further include a data table for counting orders and a data table for counting merchandise, and other related data tables, and so forth.
Illustratively, the user 104A may further open one of the data tables for data maintenance, e.g., by triggering a tab corresponding to the data table in the table.
Fig. 2A shows a schematic diagram of an exemplary page 200, according to an embodiment of the present disclosure.
As shown in fig. 2A, the user 104A triggers the first data table such that the first data table 202 may be displayed in the page 200, and a plurality of first data lines 2022 may be further included in the first data table 202 (illustratively, 12 first data lines 2022 are shown in fig. 2A). Illustratively, the first data table 202 may be an order table for recording orders. In other words, an order is recorded for each data line in the order table. Thus, a data row in a data table may also be referred to as a record (record), while data rows of other data tables associated with the current data table (or a field thereof) may also be referred to as an associated record.
In some embodiments, as shown in fig. 2A, the first data table 202 may further include a plurality of fields, each of which may correspond to one of the attribute information of the first data row 2022. Taking an order table as an example, the order table may include a plurality of fields such as an order number, an order status, a responsible person, a service line, an order time, a commodity library, and the like, and corresponding field information is filled in each first data line 2022, so that a corresponding record may be performed on one order.
Optionally, as shown in fig. 2A, a field addition control 204 may be further included in the page 200, and the user 104A may enrich the contents of the first data table 202 by triggering the field addition control 204 to add a new field to the first data table 202.
In some embodiments, when a particular field supports a one-way association or two-way association function, user 104A may associate the particular field with a data row in another data table. Illustratively, as shown in fig. 2A, the goods library field may be a field that may implement a one-way association or a two-way association, so that the user 104A may perform an association operation on the goods library field in the data line. Alternatively, whether a field can be associated with other data tables, which data tables can be associated, and whether such association is unidirectional or bidirectional, can be preconfigured. Illustratively, the commodity library field may be associated with a commodity table such that a corresponding data row may be looked up from the commodity table for association into the commodity library field of the first data table 202.
Illustratively, the user 104A may generate the data association request by triggering (e.g., clicking or double-clicking) the target field 2024 (e.g., the cell corresponding to the merchandise library field of the target data row 2022A in fig. 2A) of the target data row 2022A of the first data table 202 in the page 200 (e.g., the third row of the data row 202 in fig. 2A, which may be the data row currently being edited by the user), so that the terminal device 102A may further display the first page 210 in response to the data association request, as shown in fig. 2B.
In some embodiments, as shown in fig. 2B, the first page 210 may be displayed in a floating manner on the page 200, and compared with the operation of jumping the page, the embodiment may enable the user 104A to have a look and feel of performing data maintenance for the first data table 202 in the page 200, which is more consistent with the user's habit, thereby improving the user experience.
As shown in fig. 2B, optionally, the second data table 212 associated with the first data table 202 may be further displayed in the first page 210, so that the user 104A may find the second data row 2122 of the second data table 212 that the user wants to associate with in the first page 210. Optionally, the second data table 212 is also associated with the target field 2024, and the association of the second data table 212 with the target field 2024 may be preconfigured on the field corresponding to the target field 2024, so that when the target field 2024 is triggered, the first page 210 may present the content of the second data table 212. Illustratively, the second data table 212 may be a commodity table for recording commodity information, which may be associated with the commodity library field of the first data table 202.
As shown in fig. 2B, the second data table 212 may include a plurality of second data lines 2122, and when the display range of the first page 210 is insufficient to display all of the second data lines 2122 of the second data table 212, in some embodiments, a scroll bar 2124 may be displayed on one side of the second data table 212. The scroll bar 2124 is used to scroll the second data table 212 in a first direction (e.g., a vertical direction of the first page 210) so that the user 104A can slide through other second data rows 2122 in the second data table 212 by moving the scroll bar 2124.
In some embodiments, as shown in fig. 2B, a search field 214 may also be included in the first page 210, such that the user 104A may search the second data table 212 for a second data row 2122 that the user wants to associate by entering a search term in the search field 214, thereby facilitating the user to find the second data row 2122 that the user wants to associate more quickly. Alternatively, the search bar 214 may be displayed on top of the first page 210 so that the user 104A may more easily view the search bar 214 and more conveniently use it.
However, the inventors of the present disclosure have found that when the number of second data rows 2122 of the second data table 212 is large, the first page 210 cannot fully reveal the entire contents of the second data table 212. In particular, when the number of second data rows 2122 of the second data table 212 is very large, it may be difficult for the user 104A to find the second data row that he wants to associate with. Even if search is performed using search bar 214, suitable results may not be searched for due to inaccuracy of the search term.
In some embodiments, rows of data that may be associated may be recommended to a user based on keyword retrieval, vector similarity calculation, data analysis, and the like. However, the inventors of the present disclosure have found that using any of the above methods (i.e., using a single recall strategy) for recommendation has certain limitations, such as insufficient coverage, low accuracy, insufficient personalization, and the like. Specifically, a single recall strategy is difficult to cover all possible associated scenes and can miss partial valuable associated records, the single recall strategy is possibly affected by factors such as data sparsity, semantic ambiguity, noise data and the like, so that a recommendation result is inaccurate, the single recall strategy is difficult to fully consider individual demands and scene differences of users, and the recommendation result is high in universality and insufficient in pertinence.
In view of this, the embodiments of the present disclosure provide a table data processing method, which may improve coverage rate and accuracy of data recommendation to a certain extent by fusing at least two recall strategies, so as to solve or partially solve the above-mentioned problems to a certain extent.
Fig. 3A shows a flow diagram of an exemplary tabular data processing method 300 provided by an embodiment of the present disclosure. The method 300 may be used to recommend associable data lines. Alternatively, the method 300 may be implemented by the terminal device 102A or 102B of FIG. 1 alone, or by the server 106 of FIG. 1 alone, or interactively by the devices in the system 100 of FIG. 1. As shown in fig. 3A, the method 300 may further include the following steps.
In step 302, characteristic information of a target data row 2022A may be determined in response to an associated data recommendation request for the target data row 2022A in the first data table 2022.
Alternatively, the triggering of the associated data recommendation request may be triggered by clicking on the recommendation control 216 of the first page 210, so that the terminal device 102A may know that the user 104A currently needs to use the function of the associated data of the recommendation target data line 2022A. In some embodiments, the first page 210 is displayed by triggering the target field 2024 of the target data line 2022A, and thus the recommendation control 216 may be used to recommend associated data for the target field 2024 of the target data line 2022A, such that the associated data recommendation request may be an associated data recommendation request for data recommendation for the target field 2024 of the target data line 2022A, such that the recommended associated data is obtained for association to the target field 2024 of the target data line 2022A. In some embodiments, as shown in fig. 2A, the association data recommendation request may also be generated when the user 104A triggers the target data line 2022A in the page 200 of fig. 2A (e.g., clicks or double clicks on the target data line 2022A), so that the recommended association data may be used to associate into the target data line 2022A (e.g., by selecting a corresponding field to associate the association data to the field after the recommended association data is obtained).
In this step, the feature information of the target data line 2022A may be various features for expressing the target data line 2022A, so that a data line associated with the target data line 2022A may be obtained based on the feature information. It will be appreciated that the feature information may be obtained from the target data line 2022A by any method of extracting features, as long as it is ensured that the feature information may express the characteristics of the target data line 2022A.
In some embodiments, the determining the characteristic information of the target data line may further include extracting a plurality of keywords from the target data line, and determining the characteristic information according to the plurality of keywords. Thus, by extracting a plurality of keywords of the target data line, the characteristics of the target data line 2022A can be used to characterize the target data line 2022A, and thus can be used as the characteristic information of the target data line 2022A.
It will be appreciated that the method of extracting keywords may be used in a variety of ways. In some embodiments, the target data line 2022A may be input into a keyword extraction model from which a plurality of keywords of the target data line 2022A are output. The keyword extraction model may be a machine learning model (e.g., a neural network model) and may be trained using the data lines of an existing data table, such that the keyword extraction model may extract a plurality of keywords from the target data line 2022A that more express characteristics of the target data line 2022A, thereby enhancing the subsequent data recommendation.
In some embodiments, the determining the feature information according to the plurality of keywords may further include generating a feature vector corresponding to the target data line according to the plurality of keywords, and then determining the feature information according to the plurality of keywords and the feature vector.
In this embodiment, after obtaining the plurality of keywords, the plurality of keywords may be further converted into feature vectors based on a vectorization technique, and then both the plurality of keywords and the feature vectors may be determined as feature information of the target data line 2022A. In this way, the recommendation is performed based on the two kinds of characteristic information, so that the coverage rate and/or the accuracy of the data can be improved.
It will be appreciated that vectorization techniques may be used in a variety of ways in which feature vectors may be generated based on keywords. Alternatively, the vectorization technique may be a one-hot (one-hot) encoding algorithm, a vocabulary mapping algorithm (e.g., word2Vec algorithm), a Word embedding (Word Embedding) algorithm, and so on.
In some embodiments, the feature vector may be generated directly based on the target data line 2022A instead of being generated according to a plurality of keywords, so that the feature vector may directly represent the feature information of the target data line 2022A without being affected by the keyword extraction algorithm, and further coverage rate and accuracy of the recommended data may be improved.
After determining the characteristic information, at step 304, at least one first data row 2022 matching the characteristic information may be acquired in the first data table 202 and/or at least one second data row 2122 matching the characteristic information may be acquired in a second data table 212 associated with the first data table 202, based on the characteristic information.
In this step, after the feature information of the target data line 2022A is obtained, the matching may be performed in the first data table 202 based on the feature information to obtain the first data line 2022 matching the feature information, and/or the matching may be performed in the second data table 212 based on the feature information to obtain the second data line 2122 matching the feature information, so that the feature information is utilized to perform the feature matching in the first data table 202 or the second data table 212 to obtain the first data line 2022 or the second data line 2122 which can be recommended, and automatic recommendation of the associated data (or the associated record) may be achieved, so that the user may find the data line which is wanted to be associated in the recommendation result.
In some embodiments, the matching may be performed in the first data table 202 to obtain the first data line 2022 matching the feature information and the matching may be performed in the second data table 212 to obtain the second data line 2122 matching the feature information based on the feature information at the same time, so that two recall strategies may be utilized to obtain the associated data for recommendation, and coverage and accuracy of the recommended data may be improved relative to employing a single recall strategy.
Specifically, since there are several first data rows 2022 in the first data table 202, and these first data rows 2022 similar to the target data row 2022A are also associated with the second data row 2122, if the second data row 2122 associated with the first data row 2022 similar to the target data row 2022A is used as the recommended data, the existing associated experience can be migrated to the target data row 2022A, so that the coverage of the recommended data is enlarged, and the recall rate is improved.
In some embodiments, as shown in fig. 3B, the step of obtaining, according to the feature information, at least one first data row matching the feature information in the first data table and/or obtaining, according to the feature information, at least one second data row matching the feature information in a second data table associated with the first data table may further include the steps of:
at step 3042, at least one first candidate data line matching the plurality of keywords is obtained in the first data table according to the plurality of keywords.
In this step, at least one first data line 2022 matching the plurality of keywords may be found in the database corresponding to the first data table 202 as the at least one first candidate data line corresponding to the target data line 2022A based on the plurality of keywords of the target data line 2022A. Alternatively, a first data line 2022 having a degree of matching higher than a degree of matching threshold may be used as the first candidate data line.
It will be appreciated that any keyword matching algorithm may be utilized to find the first candidate data line. Alternatively, a string matching algorithm (e.g., KMP (Knuth-Morris-Pratt) algorithm), a word search tree (Trie) algorithm, a multi-pattern matching algorithm (e.g., AC automaton algorithm), and the like may be employed.
At step 3044, at least one second candidate data line matching the feature vector is obtained in the first data table according to the feature vector.
In this step, based on the feature vector of the target data line 2022A, at least one first data line 2022 matching the feature vector may be found in the vector database corresponding to the first data table 202 as the at least one second candidate data line corresponding to the target data line 2022A.
Alternatively, the cosine similarity of the vector may be calculated as a method of calculating the similarity. Alternatively, a first data line 2022 having a similarity higher than a similarity threshold may be used as the second candidate data line.
In step 3046, the at least one first data line matching the characteristic information is determined from the at least one first candidate data line and the at least one second candidate data line.
In this step, the at least one first candidate data line and the at least one second candidate data line may be combined into one data set as the at least one first data line matched with the feature information, so that a range of the at least one first data line matched with the feature information may be enlarged, and a coverage rate of data may be improved.
In some embodiments, an intersection may be further taken from the at least one first candidate data line and the at least one second candidate data line as the at least one first data line matched with the feature information, so that the obtained first data line meets both a keyword matching degree requirement and a vector similarity requirement, and accuracy of data may be improved.
Moreover, since the filled information of the target data line 2022A may contain rich text information and key fields, keyword extraction may be supported, which is beneficial to acquiring recommendation data through keyword matching and vector similarity calculation later.
In some embodiments, as shown in fig. 3C, the step of obtaining, according to the feature information, at least one first data row matching the feature information in the first data table and/or obtaining, according to the feature information, at least one second data row matching the feature information in a second data table associated with the first data table may further include the steps of:
at step 3048, at least one third candidate data line matching the plurality of keywords is obtained in the second data table according to the plurality of keywords.
In this step, at least one second data line 2122 matching the plurality of keywords may be found in the database corresponding to the second data table 212 as the at least one third candidate data line corresponding to the target data line 2022A based on the plurality of keywords of the target data line 2022A. Optionally, a second data line 2122 with a degree of matching higher than a degree of matching threshold may be used as the third candidate data line.
It will be appreciated that any keyword matching algorithm may be utilized to find the third candidate data line. Alternatively, a string matching algorithm (e.g., KMP (Knuth-Morris-Pratt) algorithm), a word search tree (Trie) algorithm, a multi-pattern matching algorithm (e.g., AC automaton algorithm), and the like may be employed.
At step 3050, at least one fourth candidate data line matching the feature vector is obtained in the second data table according to the feature vector.
In this step, based on the feature vector of the target data line 2022A, at least one second data line 2122 matching the feature vector may be found in the vector database corresponding to the second data table 212 as the at least one fourth candidate data line corresponding to the target data line 2022A.
Alternatively, the cosine similarity of the vector may be calculated as a method of calculating the similarity. Optionally, a second data line 2122 having a similarity higher than a similarity threshold may be used as the fourth candidate data line.
At step 3052, the at least one second data line matching the characteristic information is determined from the at least one third candidate data line and the at least one fourth candidate data line.
In this step, the at least one third candidate data line and the at least one fourth candidate data line may be combined into one data set as the at least one second data line matched with the feature information, so that the range of the at least one second data line matched with the feature information may be enlarged, and the coverage rate of data may be improved.
In some embodiments, an intersection may be further taken from the at least one third candidate data line and the at least one fourth candidate data line as the at least one second data line matched with the feature information, so that the obtained second data line meets both the keyword matching degree requirement and the vector similarity requirement, and the accuracy of the data may be improved.
In addition to the foregoing recall strategy, in some embodiments, as shown in FIG. 3D, the method 300 provides a recall strategy based on recently edited records, and may further include the steps of:
at step 3054, at least one first edited data row in the first data sheet 202 that is generated within a first preset time period (e.g., within 3 days, within 5 days, within a week, etc.) from the current time is determined.
In this step, at the current time (or the current time), a first data line in the first data table 202 that has been edited within a first preset time period (e.g., within 3 days, within 5 days, within a week, etc.) may be determined as the first edited data line.
At step 3056, at least one data row in the second data table 212 associated with the target field (e.g., a commodity library field) of the at least one first edited data row is determined to be at least one third candidate data row.
In this step, after the at least one first edited data row is found, it may be determined whether the target field (e.g., the commodity library field) of the first edited data row is associated with a certain second data row 2122 in the second data table 212, and if the target field of the first edited data row is associated with a certain second data row 2122 in the second data table 212, the certain second data row 2122 may be used as the third candidate data row.
In this embodiment, since the recently edited data line of the user has a certain similarity with the current target data line 2022A in terms of subject or requirement, the recently popular or active record may be reflected, so that the recommendation is performed based on at least one third candidate data line, the coverage rate of the recommended data may be further improved, and the recall rate may be improved.
In some embodiments, as shown in FIG. 3E, the method 300 also provides a recall strategy based on similar associated records, and may further include the steps of:
In step 3058, in response to having associated a third data row in the target field of the target data row 2022A, at least one other data row in the first data table 202 that is associated with the third data row in the target field is determined.
In this embodiment, the target field may be a field capable of associating multiple records, and one record (i.e., the third data row) is already associated in the target field of the target data row 2022A, then other data rows that are associated with the record (i.e., the third data row) on the target field as the target data row 2022A may be found in the first data table 202.
It will be appreciated that the data table associated with the destination field is the second data table 212, and thus the data row already associated in the destination field of the destination data row 2022A also belongs to the second data table 212, where the data row corresponding to the third data row is named as the third data row for distinguishing from the second data row by name, and in fact, the third data row may also be the second data row 2122 in the second data table 212.
For clarity, taking the second data table 212 as an example of the commodity table, the third data row may be a second data row corresponding to the commodity a in the second data table 212. In this step, the first data line associated with the second data line corresponding to the commodity a may be searched in the first data table 202 as the other data line.
At step 3060, in response to the number of the at least one other data line being greater than the first number (e.g., 2, 3, 4, 5, 10, etc.), it is determined that the at least one other data line is also associated with at least one fourth data line in the target field.
Taking the second data table 212 as an example of the commodity table, in this step, when the number of the first data rows associated with the second data row corresponding to the commodity a is greater than the first number, which indicates that the frequency of occurrence of the association record of the second data row corresponding to the commodity a in the first data table 202 is higher, it may be further determined whether at least one fourth data row (for example, the second data row corresponding to the commodity B) is also associated with the other data rows associated with the second data row corresponding to the commodity a. Here, in order to distinguish from the second data line by name, the data line corresponding to the fourth data line is named as the fourth data line, and in practice, the fourth data line may be the second data line 2122 in the second data table 212.
At step 3062, it is determined, for each of the fourth data rows, whether the number of the at least one other data row with which the fourth data row is associated is greater than a second number (e.g., 2, 3, 4, 5, 10, etc.).
Taking the second data table 212 as an example of the commodity table, in this step, it may be determined whether the number of the second data rows associated with the commodity B in the other data rows is greater than the second number, so as to determine whether the second data row corresponding to the commodity B is high-frequency in the other data rows.
At step 3064, at least one of the fourth data lines associated with the at least one other data line having a number greater than the second number is determined to be at least one fourth candidate data line.
Taking the second data table 212 as an example of the commodity table, in this step, when the number of the other data rows associated with the second data row corresponding to the commodity B is greater than the second number, it is determined that the second data row corresponding to the commodity B is high-frequency appearing in the other data rows, so that the second data row can be used for recommendation to the user.
In this embodiment, the plurality of first data rows 2022 of the first data table 202 have all selected a plurality of association records (associated with a plurality of second data rows), so that enough co-occurrence records can be accumulated to form a reliable association rule, and the recall rate and the recommendation accuracy are improved.
Returning to FIG. 3A, after some candidate data lines are obtained, recommended association data for the target data line may be determined at step 306 based on the at least one first data line and/or the at least one second data line that match the characteristic information.
Optionally, after obtaining the at least one first data line and/or the at least one second data line matching the characteristic information, recommendation related data for the target data line 2022A may be determined based on these data lines.
In some embodiments, the associated data recommendation request includes a request to recommend associated data for a target field of the target data row;
as shown in fig. 3F, the determining the recommended association data of the target data line according to the at least one first data line and/or the at least one second data line matched with the feature information may further include the steps of:
At step 3066, at least one data row in the second data table associated with the target field of the at least one first data row that matches the characteristic information is determined to be at least one first data row to be selected.
In this step, a data row of a second data table associated with the target field of the at least one first data row matched with the feature information may be found, and then the data row is used as the first data row to be selected.
For the sake of clarity, taking the commodity table as an example, assuming that a plurality of first data rows matching the feature information are found in the first data table 202, a target field (e.g., a commodity library field) may be found therefrom in association with one or some second data rows in the commodity table, and then these second data rows with associated records are taken as the first candidate data rows.
At step 3068, the at least one second data line matching the characteristic information is determined as at least one second candidate data line.
In this step, the at least one second data line found from the second data table 212 that matches the characteristic information may be directly determined as the at least one second candidate data line.
In step 3070, the recommended association data corresponding to the target field of the target data row is determined according to the at least one first candidate data row and the at least one second candidate data row.
In this step, the at least one first candidate data row and the at least one second candidate data row are both second data rows 2122 in the second data table 212, and the data available for association may be recommended for the target field of the target data row 2022A based on them. Two recall strategies are fused in the data to be recommended, so that the coverage rate of the recommended data can be improved.
As mentioned above, in some embodiments, the step of obtaining the recommended association data corresponding to the target field of the target data row may further include determining the recommended association data corresponding to the target field of the target data row according to the at least one first candidate data row, the at least one second candidate data row, and the at least one third candidate data row, that is, using the at least one third candidate data row obtained based on the recall policy of the most recently edited record as the recommended data.
In some embodiments, the step of determining the recommended association data corresponding to the target field of the target data row according to the at least one first, the at least one second, and the at least one third candidate data row may further comprise the steps of:
According to the at least one first data line to be selected, the at least one second data line to be selected, the at least one third data line to be selected and the at least one fourth data line to be selected, the recommended association data corresponding to the target field of the target data line is determined, that is, the at least one fourth data line to be selected, which is obtained based on the recall strategy of similar associated records, is also used as the data to be recommended, so that the coverage rate and the recommendation accuracy of the recommended data can be further improved.
It can be understood that four recall policies in the embodiments of the present disclosure may be arranged and combined to obtain multiple fusion recall policies, and these fusion policies may all be used as recall policies in the embodiments of the present disclosure to obtain data to be recommended, so that compared with a single recall policy, a higher recall rate may be obtained, and the data coverage rate is improved, thereby being applicable to a wider application scenario.
In some embodiments, the at least one first line of data to be selected and the at least one second line of data to be selected are data obtained based on feature matching, both of which may be filtered in order to further improve data accuracy.
As shown in fig. 3G, the determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first candidate data row, the at least one second candidate data row, the at least one third candidate data row, and the at least one fourth candidate data row may further include the following steps:
At step 30702, the at least one first line of data to be selected is input to a first screening model to obtain at least one first line of screening data.
Optionally, the first filtering model is a Large Language Model (LLM), and filters the at least one first data line based on a first prompt term (prompt), where the first prompt term (prompt) includes task information for filtering the at least one first data line based on an association between the first data line and the first data table.
Since the first line of data to be selected is obtained based on the first line of data 2022 screened from the first data table 202, the association relationship between the first line of data to be selected and the first data table needs to be considered when screening the first line of data to be selected. In this embodiment, task information for screening the at least one first data line based on the association relationship between the first data line and the first data table is added to the first prompt word, so that the first screening model may screen the at least one first data line based on the association relationship between the first data line and the first data table, so as to obtain more accurate data to be recommended, that is, the at least one first screening data line.
At step 30704, the at least one second line of candidate data is input to a second screening model to obtain at least one second line of screening data.
The second screening model is a large language model, and screens the at least one second data line to be selected based on a second prompt word (prompt), wherein the second prompt word (prompt) comprises task information for screening the at least one second data line to be selected based on an association relationship between the first data table and the second data table.
Since the first data line to be selected is obtained from the second data line 2122 screened from the second data table 212 based on the characteristic information of the target data line 2022A, the association relationship between the first data table and the second data table needs to be considered when screening the second data line to be selected. In this embodiment, task information for screening the at least one second candidate data row based on the association relationship between the first data table and the second data table is added to the second prompt word, so that the second screening model may screen the at least one second candidate data row based on the association relationship between the first data table 202 and the second data table 212, thereby obtaining more accurate to-be-recommended data, that is, the at least one second screening data row.
In step 30706, the recommendation associated data corresponding to the target field of the target data row is determined according to the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row.
In this embodiment, more accurate data to be recommended can be obtained by screening the first data line to be selected and the second data line to be selected.
In some embodiments, as shown in fig. 3H, the determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first filtering data row, the at least one second filtering data row, the at least one third candidate data row, and the at least one fourth candidate data row may further include the following steps:
At step 30708, at least one repeated data line (i.e., repeated identical data lines) of the at least one first screening data line, the at least one second screening data line, the at least one third candidate data line, and the at least one fourth candidate data line is determined.
At step 30710, one of the at least one repeated data line is retained and the remaining repeated data lines are deleted.
In this step, one of the identical data lines repeatedly appearing for each group is retained, and then the remaining data lines are deleted, in other words, only one of the identical data lines repeatedly appearing for each group is retained as a representative.
It will be appreciated that multiple sets of repeated identical data lines may occur, with each set of repeated identical data lines being treated as described above, i.e. with only one remaining representative.
Optionally, after the deduplication process, information such as the remaining multipath identifier of the duplicate data line (i.e., the identifier of the other duplicate data line that is deleted), the occurrence frequency of the set of duplicate data lines, and the latest update timestamp of the set of duplicate data lines may also be recorded, so that the information may be used for further processing.
In step 30712, the recommendation associated data corresponding to the target field of the target data row is determined according to the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row that delete the remaining repeated data rows.
By the embodiment, the data to be recommended can be subjected to duplicate removal processing, so that the same data row cannot appear in the final recommendation result, and the user experience is improved.
In some embodiments, as shown in fig. 3I, the determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first filtering data row, the at least one second filtering data row, the at least one third candidate data row, and the at least one fourth candidate data row may further include the following steps:
In step 30714, sorting the at least one first screening data line according to the occurrence number of the same data line in the at least one first screening data line and/or the editing time of each first screening data line, so as to obtain a first internal sorting corresponding to the at least one first screening data line.
Since the first screening data line is obtained by matching the first data line 2022 in the first data table 202 based on the feature information and then finding the second data line 2122 associated with the first data line 2022, it is understood that the second data line 2122 associated with the first data line 2022 may be identical based on the feature information matching in the first data table 202, and therefore, in this step, the ranking may be performed according to the occurrence number of the identical data line in the at least one first screening data line.
For example, still taking the merchandise table as an example, one of the at least one first screening data row is a second data row corresponding to the merchandise a, and the second data row actually appears 5 times before being subjected to the duplication elimination processing, and another of the at least one first screening data row is a second data row corresponding to the merchandise B, and the second data row actually appears 3 times before being subjected to the duplication elimination processing, so that the second data row corresponding to the merchandise a may be preferentially recommended before the second data row corresponding to the merchandise B.
In some embodiments, ordering may also be performed in conjunction with the latest edit time. For example, for a plurality of first filter data lines having the same number of occurrences, the sorting may be performed in time-far or near based on the edited time, i.e., the latest edited data line is ranked in front.
In this way, the at least one first screening data row realizes the internal sorting according to the first internal sorting, so that the data row which meets the requirement of the user can be preferentially recommended to the user.
In step 30716, the at least one second screening data row is ranked according to the matching degree of the at least one second screening data row corresponding to the plurality of keywords and/or the similarity of the at least one second screening data row and the feature vector, so as to obtain a second internal ranking corresponding to the at least one second screening data row.
Since the second screening data line is obtained by matching the feature information (keywords and/or feature vectors) in the second data table 212 to obtain the second data line 2122, the ranking may be based on the matching degree of the second screening data line with the plurality of keywords and/or the similarity with the feature vectors.
For example, the ranking may be performed first based on the keyword matching degree, then based on the feature vector similarity, and then based on the keyword matching degree being the same. Or conversely, sorting is firstly performed based on the feature vector similarity, then sorting is performed based on the keyword matching degree for the feature vector similarity to be the same.
In this way, the at least one second screening data row realizes the internal sorting according to the second internal sorting, so that the data row which meets the requirement of the user can be preferentially recommended to the user.
In step 30718, sorting the at least one third data line to be selected according to the occurrence number of the same data line in the at least one third data line to be selected and/or the editing time of each third data line to be selected, so as to obtain a third internal sorting corresponding to the at least one third data line to be selected.
Since the third candidate data line is obtained by selecting at least one edited data line from the first data table 202 and then determining the associated second data line in the target field of the edited data line, it will be understood that the second data lines 2122 associated with the edited data lines may be identical, and thus, in this step, the sorting may be performed according to the occurrence number of the identical data line in the at least one third candidate data line.
For example, still taking the commodity table as an example, one of the at least one third candidate data row is a second data row corresponding to the commodity a, where the second data row actually appears 5 times before being subjected to the deduplication process, and one of the at least one third candidate data row is a second data row corresponding to the commodity B, where the second data row actually appears 3 times before being subjected to the deduplication process, so that the second data row corresponding to the commodity a may be preferentially recommended before the second data row corresponding to the commodity B.
In some embodiments, ordering may also be performed in conjunction with the latest edit time. For example, for a plurality of third candidate data lines having the same number of occurrences, the sorting may be performed in terms of time distance based on the time being edited, i.e., the latest edited data line is ranked in front.
In this way, the at least one third candidate data line realizes the internal sorting according to the third internal sorting, so that the data line more meeting the user requirement can be preferentially recommended to the user.
In step 30720, sorting the at least one fourth to-be-selected data row according to the occurrence probability of each fourth to-be-selected data row in the at least one fourth to-be-selected data row, so as to obtain a fourth internal sorting corresponding to the at least one fourth to-be-selected data row.
Since the fourth candidate data row is obtained based on the recall policy that there is a co-occurrence record with the target data row 2022A in the first data table 202, the fourth candidate data row may be sorted by calculating a conditional probability with respect to the target data row 2022A and then based on the conditional probability.
For example, taking the commodity table as an example, if the second data row corresponding to the commodity a is already associated in the target data row 2022A, and one of the at least one fourth candidate data row is the second data row corresponding to the commodity B, the conditional probability of also associating the commodity B under the condition of associating the commodity a may be calculated, so as to determine the conditional probability of the second data row corresponding to the commodity B. According to this method, the conditional probability of each fourth candidate data line can be calculated and then ordered based thereon.
In this way, the at least one fourth candidate data line realizes the internal sorting according to the fourth internal sorting, so that the data line more meeting the user requirement can be preferentially recommended to the user.
At step 30722, an external ordering between the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row is determined.
In this step, the at least one first screening data line, the at least one second screening data line, the at least one third candidate data line, and the at least one fourth candidate data line may also be prioritized (i.e., externally ordered). That is, the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row are taken as four sets, and priorities are set between the sets, so that after the four sets are combined, it can be determined which set of data rows can be recommended preferentially.
In some embodiments, the external ordering is in the order of the fourth internal ordering, the second internal ordering, the first internal ordering, the third internal ordering, i.e., preferentially recommending the at least one fourth line of data to be selected (and sequentially recommending according to its fourth internal ordering), followed by the at least one second line of screening data, followed by the at least one first line of screening data, and finally the at least one third line of data to be selected.
In this embodiment, since the fourth candidate data line is a recommendation based on the co-occurrence association record, the user requirement can be met. And secondly, the second screening data row obtained by directly matching the characteristic information in the second data table 212 can also better meet the requirements of users. Then, the corresponding second data row found after the first data row is acquired in the first data table 202 based on the similar row can reflect the user requirement to a certain extent. Finally, the corresponding second data row found after the first data row is acquired in the first data table 202 based on the latest editing has low possibility of meeting the user requirement, but still plays a certain role in recommendation.
In step 30724, sorting the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row based on the external sorting, the first internal sorting, the second internal sorting, the third internal sorting, and the fourth internal sorting, to obtain the recommendation related data.
In this embodiment, the recommendation related data adopts a special ordering rule to complete the ordering, so that data rows more likely to meet the user requirements can be ordered in front for priority recommendation, and further the recommendation effect can be improved.
It will be appreciated that the above embodiments are described in terms of the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row being obtained, and in fact, each recall policy described above may have certain starting conditions, so that recalled data may be more satisfactory or have a higher recall rate.
Thus, in some embodiments, according to the characteristic information, obtaining at least one first data row in the first data table that matches the characteristic information and/or obtaining at least one second data row in a second data table associated with the first data table that matches the characteristic information may further comprise the steps of:
In response to the target data line 2022A being not empty (i.e., the target data line 2022A may extract the characteristic information) and the first data table 202 including the first data line 2022 for which the association operation has been performed (i.e., the first data line 2022 for which an association record already exists in the first data table 202), the at least one first data line matching the characteristic information may be acquired in the first data table according to the characteristic information.
Thus, the recall strategy is started when the starting condition is met, and the recall rate can be improved.
In some embodiments, according to the feature information, acquiring at least one first data row matching the feature information in the first data table and/or acquiring at least one second data row matching the feature information in a second data table associated with the first data table may further comprise the steps of:
In response to the target data row 2022A being not empty (i.e., target data row 2022A may extract characteristic information) and the number of second data rows 2122 of the second data table 212 being greater than a fourth number (e.g., 5000, 10000, etc.), the at least one second data row 2022 matching the characteristic information is acquired in the second data table 212 associated with the first data table 202 according to the characteristic information.
Thus, when the number of second data rows 2122 of the second data table 212 is larger, the recall policy can be adopted to obtain data rows more meeting the user requirement, and the recall rate can be improved.
In some embodiments, the determining at least one first edited data row in the first data table that is generated within a first preset time period from the current time may further include:
The at least one first edited data row is acquired in response to the first data table including the at least one first edited data row generated within the first preset time period from a current time and the target field of the first edited data row having performed an association operation (i.e., having a second data row associated with its target field).
Thus, the recall strategy is started when the starting condition is met, and the recall rate can be improved.
In some embodiments, the determining, in response to a third data row having been associated in the target field of the target data row, at least one other data row in the first data table that is associated with the third data row in the target field may further include:
In response to determining that the target field allows association of a plurality of the second data rows (i.e., the field allows addition of a plurality of stripe records), that the target field has associated a plurality of the first data rows of the first data table is higher than a predetermined proportion (e.g., a number of first data rows with a target field associated with a plurality of second data rows is 20%, 30%, 50%, etc. of a total number of data rows of the first data table) and that the number of first data rows of the first data table is greater than a fifth number (e.g., a total number of data rows of the first data table is 20, 50, etc.), at least one other data row of the first data table that is associated with a third data row in the target field of the target data row is determined.
Therefore, the recall strategy is started when the starting condition is met, so that the co-occurrence record can better meet the requirements of users.
It will be appreciated that it is possible with a small probability that all of the recall policies described above are triggered but still unable to recall data, and therefore, in some embodiments, a spam recall policy is also provided. Specifically, the method may further include the following steps:
in response to failing to acquire the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row, at least one second edited data row in the second data table 212 within a second preset time period (e.g., 1 day, 2 days, 3 days, 5 days, etc.) from a current time or a sixth number (e.g., the last 20) of third edited data rows closest to the current time is determined as the recommendation-related data of the target data row.
In this way, by using the most recently edited second data row 2122 in the second data table 212 as spam recommendation data, normal execution of the recommendation function can be ensured.
Returning to FIG. 3A, after the recommended association data is obtained, at step 308, an association data recommendation for the target data row 2022A may be generated from the recommended association data.
In this step, a part may be selected from the recommended association data as an association data recommendation result of the target data line 2022A. Because the recommended association data can be obtained based on the fused recall strategy, the specific screening strategy and/or the sorting strategy, the user requirements can be met based on the generated association data recommendation result.
In some embodiments, the generating the associated data recommendation for the target data row based on the recommended associated data may further include selecting a third number (e.g., 3, 5, etc.) of data rows from the recommended associated data as the associated data recommendation for the target data row based on the external rank, the first internal rank, the second internal rank, the third internal rank, and the fourth internal rank.
In this embodiment, according to the external sorting, the first internal sorting, the second internal sorting, the third internal sorting and the fourth internal sorting, a third number of data rows are selected from the recommendation related data to recommend the data rows, and the data rows with the front sorting can be preferentially recommended to the user, so that the user requirements can be more satisfied.
Fig. 4A shows a schematic diagram of an exemplary page 200, according to an embodiment of the present disclosure.
As shown in FIG. 4A, in some embodiments, when the user 104A triggers the recommendation control 216, the associated data recommendation comprising a plurality of recommendation data lines 220 may be displayed in the first page 210.
As shown in fig. 4A, optionally, a change control 222 may be included in the first page 210, and after the user 104A triggers the change control 222, the terminal device 102A may sequentially select a third number of multiple data rows from the recommended association data to replace the currently displayed multiple recommended data rows 220, so as to complete the change of the data rows, so that the user 104A may more easily find the data row that the user wants to associate.
As shown in FIG. 4A, in some embodiments, the forefront of each data line includes a tick box 2202, and the user 104A can associate the data line corresponding to the tick box 2202 to the target data line 2202A by tick-ing the corresponding tick box 2202 and clicking the ok control 250, thereby completing the data association. Optionally, as shown in fig. 4A, when the user 104A performs a check in the check box 2202 of the recommended data row 220, a check box of a corresponding second data row 2122 in the second data table 212 is also selected, so as to prompt the user 104A that the check operation is directed to a certain second data row 2122 in the second data table 212, and consistency of the recommended data row and the second data row is maintained.
Fig. 4B shows another schematic diagram of an exemplary page 200 according to an embodiment of the disclosure.
In some embodiments, after the user 104A selects a recommended data row from the associated data recommendations to perform the association operation (e.g., the user 104A clicks the determine control 250 by hooking the hooking box 2202 corresponding to a particular second data row) as shown in FIG. 4B, the second data row is associated into the target data row 2022A of the first data table 202 and information of the second data row is displayed in the cell 2024A of the target data row 2022A corresponding to the target field 2024. Still taking the second data table 212 as the commodity table, as shown in fig. 4A, when the user 104A selects the first recommended data row in the associated data recommendation result, the commodity corresponding to the recommended data row is a necklace, at this time, as shown in fig. 4B, the name (for example, necklace) of the commodity may be displayed in the cell 2024A corresponding to the target field 2024 of the target data row 2022A, so that the user 104A may know that the target data row 2022A has been associated with the necklace, which is the commodity in the commodity table. It will be appreciated that the information displayed in cell 2024A may also be other information that may characterize the content or characteristics of the associated second data line. Taking the second data table 212 as an example of a commodity table, the information displayed in the cell 2024A may also be other information that may characterize a commodity, such as a commodity Identification (ID).
In some embodiments, when the above-described association operation is a one-way association, the user 104A may also view the information of the second data row associated with the user by triggering the cell 2024A of the target data row 2022A, so that the user may conveniently view detailed information of the data rows in other data tables associated with the target data row. For example, by hovering a mouse over the cell 2024A, the entire row of content or a portion of the content of the second data row associated with the cell 2024A may be displayed at an associated location (e.g., above or below) of the cell 2024A (when out of page display range, a portion of the content may be displayed and slid by a scroll bar). For another example, when the cell 2024A is clicked or double-clicked, the page of the second data table 212 may be jumped to expose the entire row of content or portions of content of the second data row with which the cell 2024A is associated and is high reveal therein.
In some embodiments, when the above-mentioned association operation is bidirectional association, in addition to the functions that can be achieved by implementing the above-mentioned unidirectional association, the information of the target data line 2022A may be checked in the second data line corresponding to the target data line 2022A of the second data table 212, so that the user may conveniently check the detailed information of the target data line 2022A in other data tables that implement bidirectional association with the target data line 2022A. Taking the first data table 202 as an order table and the second data table 212 as a commodity table as an example, referring to fig. 2B, the second data line 2122 of the second data table 212 may include an order number field, and when the target data line 2022A is associated with the second data line in a bidirectional manner, the order number field of the second data line 2122 is correspondingly associated with the target data line 2022A of the order table (the first data table 202), so that the user can view the information of the associated target data line 2022A by triggering the cell corresponding to the order number field of the second data line. Similarly, the entire row or portion of content of the target data row 2022A may be viewed by a hover operation, and the page of the first data table 202 may be jumped to by a single or double click to expose the first data table 202 and highlight the entire row or portion of content of the target data row 2022A with which the cell is associated therein.
From the above embodiments, it can be seen that the embodiments of the present disclosure provide a table data processing method, which can improve coverage rate and accuracy of data recommendation to a certain extent by fusing at least two recall strategies. In some embodiments, four complementary recall strategies are designed, and potential association records (rows of data) can be mined from different dimensions. In some embodiments, union (Union) strategies are employed to merge the recall results and through deduplication, a more comprehensive candidate set is obtained. In some embodiments, a multi-level ranking strategy is adopted, and the candidate sets are ranked according to the priority of the recall strategy and the ranking rule in the strategy, so that the accuracy of the recommendation result is improved. In some embodiments, starting conditions are set for each recall strategy, so that corresponding strategies are ensured to be started under a proper scene, and recommendation efficiency and accuracy are improved. In some embodiments, a spam policy is employed to ensure availability of the recommender system in the event that all recall policies are not triggered or there are no data recalls after triggering.
According to some embodiments of the present disclosure, multiple complementary recall strategies may be designed and applied for different data features and application scenarios, and each recall result is fused through a merge strategy, so that an optimal associated record candidate set is presented to a user according to a ranking strategy.
According to some embodiments of the present disclosure, by integrating multiple recall strategies, potential associated records can be mined from different dimensions, so that recall rates of the associated records are effectively improved, and valuable information is prevented from being omitted. According to some embodiments of the present disclosure, multiple recall strategies are used to complement each other, and LLM screening and multistage sorting strategies are combined, so that noise data can be effectively filtered, and accuracy and relevance of a recommendation result can be improved. Some embodiments of the present disclosure can adaptively select and combine appropriate recall policies for different data features and application scenarios, enhancing robustness of the system.
It will be appreciated that the data recommendation algorithm described above may be applied to more than just the scenarios described in the embodiments described above. In the age of information explosion, when users face mass data, it is important to quickly and accurately find the required information. Therefore, the data recommendation algorithm can be widely applied to various application scenes, such as knowledge base construction, data analysis, intelligent assistants and the like.
It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.
It should be noted that the foregoing describes some embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The disclosed embodiments also provide a computer device for implementing the above-described method 300. Fig. 5 shows a hardware architecture diagram of an exemplary computer device 500 provided by an embodiment of the present disclosure. The computer device 500 may be used to implement the server 106 of fig. 1, as well as the terminal devices 102A, 102B of fig. 1. In some scenarios, the computer device 500 may also be used to implement the database server 108 of FIG. 1.
As shown in FIG. 5, computer device 500 may include a processor 502, a memory 504, a network interface 506, a peripheral interface 508, and a bus 510. Wherein the processor 502, the memory 504, the network interface 506 and the peripheral interface 508 enable a communication connection therebetween within the computer device 500 via a bus 510.
The processor 502 may be a central processing unit (Central Processing Unit, CPU), an image processor, a neural Network Processor (NPU), a Microcontroller (MCU), a programmable logic device, a Digital Signal Processor (DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits. The processor 502 may be used to perform functions related to the techniques described in this disclosure. In some embodiments, processor 502 may also include multiple processors integrated as a single logical component. For example, as shown in fig. 5, the processor 502 may include a plurality of processors 502a, 502b, and 502c.
Memory 504 may be configured to store data (e.g., instructions, computer code, etc.). As shown in fig. 5, the data stored by the memory 504 may include program instructions (e.g., program instructions for implementing the method 300 of embodiments of the present disclosure) as well as data to be processed (e.g., the memory may store configuration files of other modules, etc.). The processor 502 may also access program instructions and data stored in the memory 504 and execute the program instructions to perform operations on the data to be processed. Memory 504 may include volatile storage or nonvolatile storage. In some embodiments, memory 504 may include Random Access Memory (RAM), read Only Memory (ROM), optical disks, magnetic disks, hard disks, solid State Disks (SSD), flash memory, memory sticks, and the like.
The network interface 506 may be configured to provide the computer device 500 with communications with other external devices via a network. The network may be any wired or wireless network capable of transmitting and receiving data. For example, the network may be a wired network, a local wireless network (e.g., bluetooth, wiFi, near Field Communication (NFC), etc.), a cellular network, the internet, or a combination of the foregoing. It will be appreciated that the type of network is not limited to the specific examples described above.
Peripheral interface 508 may be configured to connect computer apparatus 500 with one or more peripheral devices to enable information input and output. For example, the peripheral devices may include input devices such as keyboards, mice, touchpads, touch screens, microphones, various types of sensors, and output devices such as displays, speakers, vibrators, and indicators.
Bus 510 may be configured to transfer information between the various components of computer device 500 (e.g., processor 502, memory 504, network interface 506, and peripheral interface 508), such as an internal bus (e.g., processor-memory bus), an external bus (USB port, PCI-E bus), etc.
It should be noted that, although the architecture of the computer device 500 described above illustrates only the processor 502, the memory 504, the network interface 506, the peripheral interface 508, and the bus 510, in a specific implementation, the architecture of the computer device 500 may also include other components necessary to achieve proper operation. Moreover, those skilled in the art will appreciate that the architecture of the computer device 500 described above may include only the components necessary to implement the disclosed embodiments, and not all of the components shown in the figures.
The embodiment of the disclosure also provides a table data processing device. Fig. 6 shows a schematic diagram of an exemplary apparatus 600 provided by an embodiment of the present disclosure. As shown in fig. 6, the apparatus 600 may be used to implement the method 300 and may further include the following modules.
A first determining module 602 configured to determine characteristic information of a target data row in a first data table in response to an associated data recommendation request for the target data row;
an obtaining module 604 configured to obtain, according to the feature information, at least one first data line matching the feature information in the first data table and/or at least one second data line matching the feature information in a second data table associated with the first data table;
a second determining module 606 configured to determine recommended association data of the target data line based on the at least one first data line and/or the at least one second data line matching the characteristic information;
A generation module 608 is configured to generate an associated data recommendation result of the target data row according to the recommended associated data.
In some embodiments, the first determining module 602 is configured to extract a plurality of keywords from the target data line, and determine the feature information according to the plurality of keywords.
In some embodiments, the first determining module 602 is configured to generate a feature vector corresponding to the target data line according to the plurality of keywords, and determine the feature information according to the plurality of keywords and the feature vector.
In some embodiments, the acquisition module 604 is configured to:
Acquiring at least one first candidate data row matched with the keywords in the first data table according to the keywords;
according to the feature vector, at least one second candidate data row matched with the feature vector is acquired in the first data table;
And determining the at least one first data line matched with the characteristic information according to the at least one first candidate data line and the at least one second candidate data line.
In some embodiments, the acquisition module 604 is configured to:
acquiring at least one third candidate data row matched with the keywords in the second data table according to the keywords;
According to the feature vector, at least one fourth candidate data row matched with the feature vector is obtained in the second data table;
and determining the at least one second data line matched with the characteristic information according to the at least one third candidate data line and the at least one fourth candidate data line.
In some embodiments, the associated data recommendation request includes a request to recommend associated data for a target field of the target data row;
a second determination module 606 configured to:
Determining at least one data row in the second data table associated with the target field of the at least one first data row matching the characteristic information as at least one first data row to be selected;
Determining the at least one second data line matched with the characteristic information as at least one second data line to be selected;
and determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first data row to be selected and the at least one second data row to be selected.
In some embodiments, the obtaining module 604 is configured to determine at least one first edited data row in the first data table that is generated within a first preset time period from a current time; determining at least one data row in the second data table associated with the target field of the at least one first edited data row as at least one third candidate data row;
the second determining module 606 is configured to determine the recommendation associated data corresponding to the target field of the target data row according to the at least one first data row to be selected, the at least one second data row to be selected, and the at least one third data row to be selected.
In some embodiments, the acquisition module 604 is configured to:
Determining, in response to a third data row having been associated in the target field of the target data row, at least one other data row in the first data table that is associated with the third data row in the target field;
Responsive to the number of the at least one other data line being greater than the first number, determining at least one fourth data line to which the at least one other data line is also associated in the target field;
Determining, for each of the fourth data rows, whether the number of the at least one other data row with which the fourth data row is associated is greater than a second number;
determining at least one of the fourth data lines, for which the number of the associated at least one other data line is greater than the second number, as at least one fourth candidate data line;
The second determining module 606 is configured to determine the recommendation associated data corresponding to the target field of the target data row according to the at least one first data row to be selected, the at least one second data row to be selected, the at least one third data row to be selected, and the at least one fourth data row to be selected.
In some embodiments, the second determining module 606 is configured to:
inputting the at least one first data line to be selected into a first screening model to obtain at least one first screening data line;
inputting the at least one second data line to be selected into a second screening model to obtain at least one second screening data line;
Determining the recommended association data corresponding to the target field of the target data row according to the at least one first screening data row, the at least one second screening data row, the at least one third to-be-selected data row and the at least one fourth to-be-selected data row;
The first screening model and the second screening model are large language models, the first screening model screens the at least one first data line to be selected based on a first prompting word, the second screening model screens the at least one second data line to be selected based on a second prompting word, the first prompting word comprises task information for screening the at least one first data line to be selected based on an association relation between the first data line to be selected and the first data table, and the second prompting word comprises task information for screening the at least one second data line to be selected based on an association relation between the first data table and the second data table.
In some embodiments, the second determining module 606 is configured to:
determining at least one repeated data line of the at least one first screening data line, the at least one second screening data line, the at least one third data line to be selected and the at least one fourth data line to be selected;
retaining one of the at least one duplicate data line and deleting the remaining duplicate data line;
And determining the recommendation associated data corresponding to the target field of the target data row according to the at least one first screening data row, the at least one second screening data row, the at least one third data row to be selected and the at least one fourth data row to be selected of the rest repeated data rows.
In some embodiments, the second determining module 606 is configured to:
Sorting the at least one first screening data row according to the occurrence times of the same data row in the at least one first screening data row and/or the editing time of each first screening data row, so as to obtain a first internal sorting corresponding to the at least one first screening data row;
sorting the at least one second screening data row according to the matching degree of the at least one second screening data row corresponding to the plurality of keywords and/or the similarity of the at least one second screening data row and the feature vector, so as to obtain a second internal sorting corresponding to the at least one second screening data row;
Sorting the at least one third data line to be selected according to the occurrence times of the same data line in the at least one third data line to be selected and/or the editing time of each third data line to be selected, so as to obtain a third internal sorting corresponding to the at least one third data line to be selected;
Sorting the at least one fourth data line to be selected according to the occurrence probability of each fourth data line to be selected in the at least one fourth data line to be selected, so as to obtain a fourth internal sorting corresponding to the at least one fourth data line to be selected;
determining an external ordering between the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row;
And sorting the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row and the at least one fourth candidate data row based on the external sorting, the first internal sorting, the second internal sorting, the third internal sorting and the fourth internal sorting to obtain the recommendation related data.
In some embodiments, the external ordering is in the order of the fourth internal ordering, the second internal ordering, the first internal ordering, the third internal ordering;
the generating module 608 is configured to select a third number of data rows from the recommended associated data as the associated data recommendation result of the target data row based on the external ordering, the first internal ordering, the second internal ordering, the third internal ordering, and the fourth internal ordering.
In some embodiments, the obtaining module 604 is configured to obtain the at least one first data row matching the characteristic information in the first data table according to the characteristic information in response to the target data row not being empty and the first data table including the first data row for which an association operation has been performed.
In some embodiments, the obtaining module 604 is configured to obtain the at least one second data row matching the characteristic information in the second data table associated with the first data table according to the characteristic information in response to the target data row not being empty and the number of the second data rows of the second data table being greater than a fourth number.
In some embodiments, the obtaining module 604 is configured to obtain the at least one first edited data row in response to the first data table including the at least one first edited data row produced within the first preset time period from a current time and the target field of the first edited data row having performed an association operation.
In some embodiments, the obtaining module 604 is configured to determine, in response to determining that the target field allows association of a plurality of the second data rows, that the target field in the first data table has associated a plurality of the first data rows, that a ratio of the first data rows in the first data table is higher than a preset ratio and that a number of the first data rows in the first data table is greater than a fifth number, that a third data row has been associated in the target field of the target data row, at least one other data row in the first data table that has been associated in the target field of the third data row.
In some embodiments, the obtaining module 604 is configured to determine at least one second edited data row in the second data table within a second preset time period from the current time or a sixth number of third edited data rows closest to the current time as the recommended association data of the target data row in response to failure to obtain the at least one first screening data row, the at least one second screening data row, the at least one third candidate data row, and the at least one fourth candidate data row.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of the various modules may be implemented in the same one or more pieces of software and/or hardware when implementing the present disclosure.
The apparatus of the foregoing embodiments is configured to implement the corresponding method 300 in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein.
Based on the same inventive concept, the present disclosure also provides a non-transitory computer-readable storage medium containing a computer program, corresponding to any of the above-described embodiment methods, which when executed by one or more processors, causes the one or more processors to perform the method 300.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the above embodiment stores a computer program for causing the one or more processors to perform the method 300 according to any one of the above embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Based on the same inventive concept, the present disclosure also provides a computer program product, corresponding to any of the embodiment methods 300 described above, comprising a computer program. In some embodiments, the computer program is executable by one or more processors to cause the processors to perform the described method 300. Corresponding to the execution bodies corresponding to the steps in the embodiments of the method 300, the processor executing the corresponding step may belong to the corresponding execution body.
The computer program product of the above embodiment is configured to cause a processor to perform the method 300 of any of the above embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
It will be appreciated by persons skilled in the art that the foregoing discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure, including the claims, is limited to these examples, that the steps may be implemented in any order and that many other variations of the different aspects of the disclosed embodiments described above are present, which are not provided in detail for the sake of brevity, and that the features of the above embodiments or of the different embodiments may also be combined within the spirit of the disclosure.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present disclosure. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this also accounts for the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present disclosure are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the disclosure, are intended to be included within the scope of the disclosure.