CN120578805A

CN120578805A - Internet trade data search and processing method and system

Info

Publication number: CN120578805A
Application number: CN202510664281.9A
Authority: CN
Inventors: 杨之语; 卢麒琦; 佟健; 梁瑶; 马宁
Original assignee: Jiamusi University
Current assignee: Jiamusi University
Priority date: 2025-05-22
Filing date: 2025-05-22
Publication date: 2025-09-02

Abstract

A method for searching and processing Internet trade data includes collecting data from global trade website, electronic commerce platform, customs database and social media channel, cleaning the data by machine learning algorithm to construct unified multilingual language meaning index library, constructing multi-layer index system, indexing key words based on reverse index technology, indexing entity based on knowledge map, building time sequence index according to trade time, understanding user's search intention by deep learning, providing screening and sorting functions based on multiple dimensions of price and market demand, recommending potential suppliers or trade opportunities, storing trade data by block chain, providing visual report based on analysis of big data, and displaying trade activity intensity of different areas according to geographic thermodynamic diagram. The invention provides more comprehensive and detailed trade activity analysis by combining the data of a plurality of dimensions of price, market demand and transaction frequency, and more accurately reflects market dynamics.

Description

Internet trade data search processing method and system

Technical Field

The invention belongs to the technical field of big data analysis, and particularly relates to a method and a system for searching and processing internet trade data.

Background

Many current trade data analysis methods focus on a single dimension, such as transaction amount, transaction frequency, etc., and neglect multi-dimensional comprehensive analysis, especially the combination of geographic information. Conventional approaches often fail to reflect dynamic changes in the market and regional trade activity strengths in real time. Modern trade data comes from different platforms, different languages and different forms, and integrating data from different sources becomes a big problem. The prior art often lacks efficient mechanisms to process and fuse these multi-source heterogeneous data. The existing geographic information visualization technology, such as geographic thermodynamic diagrams, can intuitively display regional data distribution, but most of the existing geographic information visualization technology can only display static historical data and cannot reflect real-time market dynamics. In addition, existing GIS systems are inefficient in processing large-scale trade data. Most of the prior geographic information systems and trade data analysis platforms cannot update or support dynamic geographic thermodynamic diagram display in real time, and cannot reflect the influence of market changes and emergencies in time.

The traditional prediction model has insufficient precision, and the existing prediction model, such as a time series model based on linear regression or ARIMA, generally has lower prediction precision, and particularly cannot effectively capture complex dynamics and potential trading opportunities of the market when processing large-scale and multidimensional trade data. The ability to integrate multidimensional data is lacking, that existing predictive models can generally only predict based on a limited single data source (e.g., price, demand, etc.), and the ability to integrate analysis of multiple dimensions (e.g., supply chain stability, market demand fluctuations, regional policy changes, etc.).

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The present invention has been made in view of the above-mentioned or existing problems with the internet trade data search processing method and system.

In order to solve the technical problems, the invention provides the following technical scheme:

The embodiment of the invention provides an internet trade data searching and processing method, which comprises the steps of collecting data from a global trade website, an electronic commerce platform, a customs database and a social media channel, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages, and constructing a unified multilingual language meaning index library;

constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge graph, establishing a time sequence index according to transaction time, and updating the index in real time by adopting a streaming data processing framework;

Understanding the user search intention by deep learning, providing screening and sorting functions based on multiple dimensions of price, market demand, supply chain stability and transaction frequency, and recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies;

Storing trade data by using a blockchain, and automatically verifying the qualification of the supplier by using an intelligent contract;

visual reports are provided based on big data analysis, trade activity intensity of different areas is displayed according to a geographic thermodynamic diagram, and regional trade opportunity prediction is supported.

The invention relates to an optimal scheme of an internet trade data searching and processing method, wherein the method comprises the steps of cleaning data by using a machine learning algorithm, translating and standardizing data in different languages, and constructing a unified multilingual semantic index library, and the method comprises the following steps:

detecting abnormal price, weight and transaction amount by using a machine learning algorithm, finding data points with abnormal concentration, and calculating by abnormal scores:

Wherein S (x) is an anomaly score, μ is a data mean, σ is a standard deviation;

Duplicate data were tested using Jaccard similarity:

if J (A, B) >0.85, then consider duplicate data, remove low quality version;

spelling and grammar correction are performed by using a deep learning model, an inverted index and a semantic index are constructed, text vectors are stored by FAISS and BERTembedding, and near-nearest acceleration cross-language search is performed.

The invention relates to an optimal scheme of an internet trade data searching processing method, wherein the method for constructing a multi-layer index system and indexing keywords based on an inverted index technology comprises the following steps:

The keyword weight is calculated by TF-IDF:

Where N is the total number of documents and df _t is the number of documents containing the word;

keyword indexes are stored in the elastomer search, and are ranked using BM25 at query time:

Wherein D is a document to be retrieved, Q is a keyword set of a user query, t is a certain keyword in the query, and k and b are adjustment parameters.

The invention relates to an optimal scheme of an internet trade data searching and processing method, wherein the method comprises the steps of establishing a time sequence index according to transaction time based on a knowledge graph index entity, updating the index in real time by adopting a streaming data processing framework, and comprises the following steps:

Extracting company, product, supplier, customer, trade place and trade amount from original trade data, using natural language processing and named entity identification technology to identify information, using graph database to store relationship data, adopting clustering algorithm to combine similar entities and resolving spelling difference of same entity in different data sources;

establishing a time sequence index, time-stamping transaction data, establishing a time window index, and optimizing query performance;

And connecting a data source, receiving new trade data in real time, using Kafka or RabbitMQ as data transfer, analyzing the new data, and identifying newly added or updated entities and transaction records.

The invention provides a screening and sorting function based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency by utilizing deep learning to understand user search intention, and recommends potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technology, comprising:

acquiring historical quotations of suppliers, predicting and analyzing price fluctuation by adopting LSTM time sequence, and calculating the optimal price suppliers:

wherein P _s is a supplier quotation, deltaP _s is a price change trend of the past 6 months, and alpha is a price sensitivity parameter;

analyzing market demand trend based on global trade data, predicting sales volume change by ARIMA, and recommending high-demand product suppliers preferentially if market demand increases;

calculating similar suppliers by adopting collaborative filtering based on articles, and mining a high-frequency transaction mode based on an Apriori algorithm:

If a provider a frequently transacts with B, then B is recommended as a potential partner.

As a preferable scheme of the internet trade data searching processing method, the method for storing trade data by using a blockchain comprises the following steps of:

The method comprises the steps of adopting Merkle tree structure storage, adopting IPFS distributed storage for trade data, storing a data hash value into a blockchain, calling an intelligent contract to submit authentication information when a provider registers, enabling the provider to submit the authentication information, enabling the blockchain to generate a unique identity mark, and automatically updating the state of the provider by the intelligent contract after authentication is passed.

The invention provides a visual report based on big data analysis, which displays trade activity intensity of different areas according to a geographic thermodynamic diagram, supports regional trade opportunity prediction, and comprises the following steps:

The method comprises the steps of calculating trade activity intensity for each area, analyzing fluctuation and trend of trade activities of each area in different time periods, visualizing the trade activity intensity of different areas through thermodynamic diagrams, visualizing geographic data through GeoPandas, carrying out short-term prediction according to trend and seasonal change of historical data, predicting trade demands of long time spans through a deep learning model, dividing the areas into different clusters, identifying areas with active trade activities, and predicting future trade opportunities of the areas according to multiple dimensions.

The Internet trade data searching and processing system comprises a collecting and preprocessing module, a processing module and a processing module, wherein the collecting and preprocessing module is used for collecting data from a global trade website, an electronic commerce platform, a customs database and a social media channel, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages and constructing a unified multilingual language meaning index library;

The index construction module is used for constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge graph, establishing a time sequence index according to transaction time, and updating the index in real time by adopting a streaming data processing framework;

the search optimization module is used for understanding the search intention of the user by utilizing deep learning, providing screening and sorting functions based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency, and recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies;

the certification and verification module is used for storing trade data by adopting a blockchain and automatically verifying the qualification of the supplier by utilizing an intelligent contract;

And the visualization and analysis module is used for providing a visual report based on big data analysis, displaying the trade activity intensity of different areas according to the geographic thermodynamic diagram and supporting the regional trade opportunity prediction.

A computing device, the computing device comprising:

At least one processor, memory, and input output unit;

wherein the memory is used for storing a computer program, and the processor is used for calling the computer program stored in the memory to execute the steps of the internet trade data searching processing method.

A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the steps of an internet trade data search processing method.

The method has the beneficial effects that by combining data of multiple dimensions such as price, market demand, supply chain stability, transaction frequency and the like, more comprehensive and fine trade activity analysis can be provided, and compared with a traditional single-dimension analysis method, market dynamics can be reflected more accurately. By combining geographic information with trade data, intuitive regional thermodynamic and trend graphs are formed that can help users more easily identify the strength of trade activity and potential market opportunities for different regions. By adopting the streaming data processing framework, large-scale trade data can be updated and processed in real time, so that the analysis result always reflects the latest dynamic state of the market.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an internet trade data search processing method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an internet trade data search processing system according to an embodiment of the present invention.

Fig. 3 schematically shows a schematic structural diagram of a medium according to an embodiment of the present invention.

FIG. 4 schematically illustrates a structural diagram of a computing device in accordance with embodiments of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Examples

Referring now to fig. 1, fig. 1 is a flowchart illustrating a method for searching and processing internet trade data according to an embodiment of the present invention. It should be noted that embodiments of the present invention may be applied to any scenario where applicable.

The flow of the internet trade data searching processing method provided by the embodiment of the invention shown in fig. 1 comprises the following steps:

S1, collecting data from global trade websites, electronic commerce platforms, customs databases and social media channels, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages, and constructing a unified multilingual language meaning index library.

Preferably, machine learning algorithms are used to detect abnormal price, weight and transaction amount, find data points with abnormal concentration, and calculate by anomaly score:

Duplicate data were tested using Jaccard similarity:

if J (A, B) >0.85, then consider duplicate data, remove low quality version;

Further, assume that there is a data set containing a plurality of trade transaction records, wherein each record includes a transaction price, weight, and amount. The data sets were as follows:

Calculating data mean and standard deviation:

The mean and standard deviation were calculated for price, weight and amount, respectively.

For example, the mean μ price of the price=170, the standard deviation σ price=60, for example, for transaction T003, Z-Score is:

S (3000) =47.17, and since Z-Score is much larger than 3, T003 is determined as abnormal data.

It is assumed that two commodity descriptions, which are calculated as Jaccard similarity by taking them as a set, J (a, B) =0.6 because Jaccard similarity is less than 0.85, are stored as data records a and B, and therefore, these two records are not considered to be duplicated.

And S2, constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge graph, establishing a time sequence index according to transaction time, and updating the index in real time by adopting a streaming data processing framework.

Preferably, the keyword weights are calculated using TF-IDF:

Preferably, the company, product, supplier, customer, transaction location and transaction amount are extracted from the raw trade data,

Identifying information by using natural language processing and named entity identification technology, storing relational data by using a graph database, merging similar entities by adopting a clustering algorithm, and solving spelling differences of the same entities in different data sources;

Further, it is assumed that a document set exists, and document 1 is "apple phone 64GB black". In the document, the number of occurrences of the "mobile phone" is 1, and the total word number of the document is 4.TF ("cell phone", D1) =0.25;

Assuming that the document set has 3 documents, the "handset" appears in all 3 documents (i.e., dft ("handset") =3, total number of documents n=3), IDF ("handset") =log (1) =0, since the IDF value is 0, meaning that the amount of information in this document set by the "handset" is very low, without distinction, combining TF and IDF, assuming that TF-IDF of "handset" is calculated in document 1: TF-IDF ("handset", D1) =0.25·0=0.

In the elastic search, an index containing documents is first created. Each document contains a plurality of fields:

the product document data is inserted into the elastiscearch index. The text field in each document is subjected to word segmentation processing, and index information is stored:

POST/products/_doc/1

{

"name": "apple phone 64GB black",

Description is that the apple phone is 64GB black, which is suitable for various applications "

}

The elastomer search automatically splits text fields in the document into words and generates a corresponding index for each word. At query time, the elastiscearch will calculate the weight of the word based on the word frequency and the inverse document frequency.

Further, neo4j is selected as the graph database for storing the extracted entities and their relationships. Each node represents an entity. The entities are connected by relationships such as "vendor A" supplies "product X" and "customer X" purchases "product Y".

The method comprises the steps of creating nodes of different types, namely storing information such as company names, addresses and industries, storing information such as product names, descriptions and prices, storing information such as provider names, addresses and supply capacity, storing information such as customer names and purchase histories, creating a graph relation according to a relation of data extraction, namely creating a relation of 'provider A', 'provider X', 'customer X', 'purchase', 'product Y', 'transaction', 'occurring in the New York', storing transaction records as relations, representing connection among the nodes, and connecting the transaction data to related nodes through time stamps, wherein the transaction occurs at a certain time point.

A time stamp is added to each transaction record indicating the specific time at which the transaction occurred. According to the transaction time, the transaction data are divided into different time windows for storage. This may speed up time-based queries, such as querying all transaction records for a month. In the elastiscearch, a time sequence index may be created using a time stamp field. During inquiry, the inquiry performance can be improved by screening in a time range.

And optimizing the query performance by using the time window index and the slicing strategy.

And S3, understanding the search intention of the user by utilizing deep learning, providing screening and sorting functions based on multiple dimensions of price, market demand, supply chain stability and transaction frequency, and recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies.

Preferably, historical offers of suppliers are obtained, price fluctuations are predicted and analyzed by LSTM time series, and the optimal price supplier is calculated:

Further, the system obtains supplier quotation data including supplier ID, product ID, quotation date, quotation amount and transaction amount, predicts future price by using LSTM model, price change trend (ΔPs), market demand parameter (D) of past 6 months, price of 3 months in future, data windowing, converting time series data into fixed length input, price of past 6 months as input, price of next month, training set proportion 80%, test set 20%;

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import LSTM,Dense

construction of LSTM model #

model=Sequential([

LSTM(50,activation='relu',return_sequences=True,input_shape=(6,1)),

LSTM(50,activation='relu'),

Dense(1)

])

model.compile(optimizer='adam',loss='mse')

Training model #

model.fit(X_train,y_train,epochs=50,batch_size=16,validation_data=(X_test,y_test))

Predicting future market demands according to historical trading volume, outputting future sales volume predictions of 3 months according to a trading volume time sequence, and adopting ADF (automatic frequency correction) for inspection:

from statsmodels.tsa.stattools import adfuller

result=adfuller(sales_data)

print ("P-value:", result [1 ])#P0.05 illustrates that data is stable

If the data is not stable, differential processing is adopted:

sales_data_diff=sales_data.diff().dropna()

determining AR and MA orders:

from statsmodels.tsa.arima.model import ARIMA

model=ARIMA(sales_data,order=(2,1,2))

model_fit=model.fit()

forecast=model_fit.forecast(steps=3)

print ('future 3 months forecast sales:', forecast)

If market demand grows (Δd > 0) for 3 months in the future, suppliers of high sales products are preferably recommended:

recommended_supplier=supplier_with_high_sales。

And S4, storing trade data by using a blockchain, and automatically verifying the qualification of the supplier by using an intelligent contract.

Preferably, merkle tree structure storage is adopted, trade data is stored in IPFS distributed mode, data hash values are stored in a blockchain, when a supplier registers, intelligent contracts are called to submit authentication information, the supplier submits the authentication information, the blockchain generates unique identity identification, and after authentication is passed, the intelligent contracts automatically update the state of the supplier.

Further, the vendor registers the smart contract:

trade data storage intelligence contract:

transaction data is uploaded to IPFS:

Computing Merkle tree root hash:

and S5, providing a visual report based on big data analysis, displaying the trade activity intensity of different areas according to the geographic thermodynamic diagram, and supporting the regional trade opportunity prediction.

Preferably, the trade activity intensity is calculated for each area, fluctuation and trend of the trade activity of each area in different time periods are analyzed, the trade activity intensity of different areas is visualized through thermodynamic diagrams, geographic data is visualized through GeoPandas, short-term prediction is carried out according to trend and seasonal change of historical data, trade demands of long time span are predicted through a deep learning model, the areas are divided into different clusters, the areas with active trade activities are identified, and future trade opportunities of the areas are predicted according to multiple dimensions.

Further, the trade total amount, the trade times and the import and export amount are calculated through comprehensive weights, and the trade activity intensity index of each area is obtained, so that the trade activity degree of the area is quantized.

The index is calculated as follows:

Ti=w ₁ × Σ transaction amount+w ₂ × number of Σ transactions+w ₃ × amount of Σ import/export

Where w ₁,w₂,w₃ is a weight that can be optimized by historical data;

the trade growth rate of daily average, zhou Jun, monthly was calculated and seasonal trends and periodic wave patterns were found.

The method comprises the steps of carrying out short-term trend prediction by adopting an autoregressive integral moving average model, predicting trade changes of 1-6 months in the future, predicting trade demand changes of 1-2 years in the future by adopting a long-short-term memory network, combining market economic data, policy adjustment information and consumption trend, improving prediction accuracy, dividing regions into different categories by adopting a K-means clustering algorithm, identifying trade active regions, predicting regions with higher future growth potential by adopting a XGBoost machine learning model, and optimizing market layout.

Having described the method of the exemplary embodiment of the present invention, a packet loss error correction system based on a sliding window according to the exemplary embodiment of the present invention will be described with reference to fig. 2, and the system includes:

the collecting and preprocessing module is used for collecting data from global trade websites, electronic commerce platforms, customs databases and social media channels, cleaning the data by using a machine learning algorithm, translating and standardizing the data in different languages, and constructing a unified multilingual language meaning index library;

Having described the method and apparatus of the exemplary embodiments of the present invention, reference is next made to fig. 3 for a description of a computer readable storage medium of the exemplary embodiments of the present invention, and referring to fig. 3, the computer readable storage medium is shown as an optical disc 30, on which is stored a computer program (i.e., a program product) that, when executed by a processor, implements the steps described in the above method embodiments, for example, collecting data from a global trade website, an electronic commerce platform, a customs database, and social media channels, cleaning the data using a machine learning algorithm, translating and normalizing the data in different languages, and constructing a unified multilingual speech-meaning index library; the method comprises the steps of constructing a multi-layer index system, indexing keywords based on an inverted index technology, establishing a time sequence index according to transaction time based on a knowledge map index entity, updating the index in real time by adopting a streaming data processing framework, understanding user search intention by deep learning, providing screening and sorting functions based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency, recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommending and association rule mining technologies, storing trade data by adopting a blockchain, automatically verifying supplier qualification by utilizing intelligent contracts, providing a visual report based on big data analysis, displaying trade activity intensity of different areas according to geographic thermodynamic diagrams, and supporting regional trade opportunity prediction, wherein specific implementation modes of the steps are not repeated here.

It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.

Having described the methods, apparatus and media of exemplary embodiments of the present invention, next, a computing device of an internet trade data search process of exemplary embodiments of the present invention is described with reference to fig. 4.

FIG. 4 illustrates a block diagram of an exemplary computing device 40 suitable for use in implementing embodiments of the invention, the computing device 40 may be a computer system or a server. The computing device 40 shown in fig. 4 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present invention.

As shown in FIG. 4, components of computing device 40 may include, but are not limited to, one or more processors or processing units 401, a system memory 402, and a bus 403 that connects the different system components (including system memory 402 and processing units 401).

Computing device 40 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computing device 40 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 4021 and/or cache memory 4022. Computing device 40 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM4023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4 and commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media), may be provided. In such cases, each drive may be coupled to bus 403 through one or more data medium interfaces. The system memory 402 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.

A program/utility 4025 having a set (at least one) of program modules 4024 may be stored in, for example, system memory 402, and such program modules 4024 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 4024 generally perform the functions and/or methodologies of the described embodiments of the present invention.

Computing device 40 may also communicate with one or more external devices 404 (e.g., keyboard, pointing device, display, etc.). Such communication may occur through an input/output (I/O) interface 405. Moreover, computing device 40 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 406. As shown in fig. 4, network adapter 406 communicates with other modules of computing device 40, such as processing unit 401, etc., over bus 403. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in connection with computing device 40.

The processing unit 401 performs various functional applications and data processing by running programs stored in the system memory 402, for example, collecting data from global trade websites, electronic commerce platforms, customs databases and social media channels, cleaning the data using machine learning algorithms, translating and standardizing data of different languages, constructing a unified multilingual semantic index library, constructing a multi-layer index system, indexing keywords based on an inverted index technology, indexing entities based on a knowledge map, establishing a time sequence index according to transaction time, updating the index in real time by using a streaming data processing framework, understanding user search intention by using deep learning, providing screening and sorting functions based on a plurality of dimensions of price, market demand, supply chain stability and transaction frequency, recommending potential suppliers or trade opportunities by combining collaborative filtering, content recommendation and association rule mining technologies, storing trade data by using a block chain, automatically verifying supplier qualification by using intelligent contracts, providing a visual report based on large data analysis, displaying trade activity intensity of different areas, and supporting regional trade opportunity prediction.

The specific implementation of each step is not repeated here. It should be noted that while several units/modules or sub-units/sub-modules of a multi-commodity flow based synchronous escape routing apparatus are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

In the description of the present invention, it should be noted that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It should be noted that the foregoing embodiments are merely illustrative embodiments of the present invention, and not restrictive, and the scope of the invention is not limited to the embodiments, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features of the embodiments described in the foregoing embodiments may be easily contemplated within the scope of the present invention, and the spirit and scope of the technical solutions of the embodiments do not depart from the spirit and scope of the embodiments of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Furthermore, although the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Claims

1. A method for searching and processing Internet trade data, comprising:

Collect data from global trade websites, e-commerce platforms, customs databases, and social media channels, use machine learning algorithms to clean the data, translate and standardize data in different languages, and build a unified multilingual semantic index library;

Build a multi-layer indexing system, index keywords based on inverted index technology, index entities based on knowledge graphs, establish a time series index based on transaction time, and use a streaming data processing framework to update the index in real time;

Leveraging deep learning to understand user search intent, it provides filtering and sorting capabilities based on multiple dimensions, including price, market demand, supply chain stability, and transaction frequency. It also combines collaborative filtering, content recommendation, and association rule mining techniques to recommend potential suppliers or trading opportunities.

Use blockchain to store trade data and use smart contracts to automatically verify supplier qualifications;

Provides visual reports based on big data analysis, displays the intensity of trade activities in different regions according to geographic heat maps, and supports regional trade opportunity forecasts.

2. The Internet trade data search and processing method according to claim 1, wherein the step of using a machine learning algorithm to clean the data, translate and standardize data in different languages, and construct a unified multilingual semantic index library comprises:

Machine learning algorithms are used to detect abnormal prices, weights, and transaction amounts, discover data points with abnormal density, and calculate anomaly scores:

Where S(x) is the anomaly score, μ is the data mean, and σ is the standard deviation;

Use Jaccard similarity to detect duplicate data:

If J(A,B)>0.85, it is considered as duplicate data and the low-quality version is removed;

Use deep learning models for spelling and grammar correction, build inverted indexes and semantic indexes, enable FAISS and BERTembedding to store text vectors, and approximate nearest neighbors to accelerate cross-language searches.

3. The Internet trade data search and processing method according to claim 1, wherein the step of constructing a multi-layer index system and indexing keywords based on an inverted index technique comprises:

Keyword weights are calculated using TF-IDF:

The keyword index is stored in Elasticsearch and is sorted using BM25 when querying:

Among them, D is the document to be retrieved, Q is the keyword set of the user query, t is a keyword in the query, and k and b are adjustment parameters.

4. The internet trade data search and processing method according to claim 1, wherein the knowledge graph-based indexing entity establishes a time series index according to transaction time, and the streaming data processing framework is used to update the index in real time, comprising:

Extract companies, products, suppliers, customers, transaction locations, and transaction amounts from raw trade data. Use natural language processing and named entity recognition to identify this information. Use graph databases to store relational data. Use clustering algorithms to merge similar entities and resolve spelling differences between entities in different data sources.

Establish a time series index, timestamp transaction data, establish a time window index, and optimize query performance;

Connect to data sources, receive new trade data in real time, use Kafka or RabbitMQ as a data transit, parse the new data, and identify new or updated entities and transaction records.

5. The internet trade data search and processing method according to claim 1, characterized in that the method utilizes deep learning to understand user search intent, provides screening and sorting functions based on multiple dimensions such as price, market demand, supply chain stability, and transaction frequency, and combines collaborative filtering, content recommendation, and association rule mining techniques to recommend potential suppliers or trade opportunities, including:

Get historical quotes from suppliers, use LSTM time series prediction to analyze price fluctuations, and calculate the supplier with the best price:

Where _Ps is the supplier's quotation, _ΔPs is the price change trend in the past 6 months, and α is the price sensitivity parameter;

Analyze market demand trends based on global trade data and use ARIMA to predict sales changes. If market demand increases, prioritize suppliers of high-demand products.

Use item-based collaborative filtering to calculate similar suppliers and use the Apriori algorithm to mine high-frequency trading patterns:

If supplier A frequently trades with supplier B, then supplier B is recommended as a potential partner.

6. The Internet trade data search and processing method according to claim 1, wherein the use of blockchain to store trade data and the use of smart contracts to automatically verify supplier qualifications include:

The Merkle tree structure is used for storage, and trade data is distributedly stored using IPFS. The data hash value is stored in the blockchain. When a supplier registers, the smart contract is called to submit authentication information. After the supplier submits the authentication information, the blockchain generates a unique identity. After the authentication is passed, the smart contract automatically updates the supplier status.

7. The internet trade data search and processing method according to claim 1, wherein providing a visualization report based on big data analysis, displaying the intensity of trade activities in different regions according to a geographic heat map, and supporting regional trade opportunity forecasting, comprises:

Calculate the intensity of trade activity for each region and analyze the fluctuations and trends of trading activities in each region over different time periods; visualize the intensity of trade activity in different regions through heat maps, use GeoPandas to visualize geographic data, make short-term forecasts based on trends and seasonal changes in historical data, and predict long-term trade demand through deep learning models; divide regions into different clusters, identify areas with active trade activities, and predict future trade opportunities in regions based on multiple dimensions.

8. An Internet trade data search and processing system, comprising:

The collection and preprocessing module collects data from global trade websites, e-commerce platforms, customs databases, and social media channels, cleans the data using machine learning algorithms, translates and standardizes data in different languages, and builds a unified multilingual semantic index library;

The index construction module is used to build a multi-layer index system. It indexes keywords based on inverted index technology, indexes entities based on knowledge graphs, establishes time series indexes based on transaction time, and uses a streaming data processing framework to update the index in real time.

The search optimization module uses deep learning to understand user search intent and provides filtering and sorting capabilities based on multiple dimensions such as price, market demand, supply chain stability, and transaction frequency. It also combines collaborative filtering, content recommendation, and association rule mining techniques to recommend potential suppliers or trading opportunities.

The evidence storage and verification module is used to store trade data using blockchain and automatically verify supplier qualifications using smart contracts;

The visualization and analysis module is used to provide visual reports based on big data analysis, display the intensity of trade activities in different regions according to geographic heat maps, and support regional trade opportunity forecasts.

9. A computing device, comprising:

at least one processor, memory, and input-output unit;

The memory is used to store a computer program, and the processor is used to call the computer program stored in the memory to execute the steps of the Internet trade data search and processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium comprising instructions, which, when executed on a computer, enables the computer to execute the steps of the Internet trade data search and processing method according to any one of claims 1 to 7.