下面將結合本申請實施例中的圖式,對本申請實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例僅僅是本申請一部分實施例,而不是全部的實施例。基於本申請中的實施例,本領域普通技術人員所獲得的所有其他實施例,都屬於本申請保護的範圍。
在本申請實施例中,為了能夠提高物流物件分類效率,降低人力成本,可以預先建立編碼分類模型,例如,具體可以包括邏輯回歸模型、決策樹模型、神經網路模型,等等。其中,如果編碼分類模型為邏輯回歸模型,可以通過機器學習建立編碼分類模型,然後通過編碼分類模型自動為物流物件(具體可以是指商品物件等)進行分類,以判定其對應的編碼的實現方式。具體的,如圖1所示,可以採集一些訓練樣本,具體可以是已知的商品物件的文字描述資訊與HScode等編碼之間的對應關係,然後,利用根據訓練樣本以及訓練目標的特點,對資料進行處理,並輸入到具體的機器學習模型中進行訓練,並最終建立起具體的編碼分類模型。之後,便可以利用這種編碼分類模型對具體物流物件所屬的編碼進行預測。具體的預測結果可以直接作為編碼分類結果,或者,也可以作為編碼分類結果的參考,等等。
具體實現時,在建立好編碼分類模型後,可以提供給商家使用者、跨境網路銷售系統的報關合作方(CP)、海關部門等在物流物件通關環節使用,從而用機器歸類代替或部分代替傳統方式的人工歸類,提升物流物件通關的效率,降低企業歸類成本。另外,為了方便使用者使用,降低使用上述模型的技術門檻。如圖2所示,還可以在編碼分類模型的基礎上,進一步開發成介面化的分類工具(可以是線上工具,或者也可以是可以安裝到本地的應用程式等),這樣,使用者只要通過介面中提供的輸入框等控制項,輸入待分類的目標物流物件的文字描述資訊,該分類工具便可以自動進行處理,並調用預先配置好的編碼分類模型,給出最終的分類建議。
下面對具體的實現方案進行詳細介紹。
實施例一
首先,本申請實施例一從前述分類工具的角度,提供了一種對物流物件資訊處理方法方法,在該方法中,首先可以獲得編碼分類模型,例如,具體可以包括邏輯回歸模型、決策樹模型、神經網路模型,等等。其中,對於邏輯回歸模型,具體的編碼分類模型中可以保存有每個編碼對應的特徵詞權重向量。例如,對於海關編碼HScode而言,編碼分類模型中具體保存的可以是每個HScode對應的特徵詞權重向量;所述特徵詞權重向量中記錄有各個特徵詞對關聯HScode的判別權重值。
在上述邏輯回歸模型的情況下,針對HScode,具體的編碼分類模型可以是預先建立好的,在其中一種具體實現時,建立模型的步驟可以首先包括:
步驟一:收集訓練樣本,其中,每條訓練樣本中包括已知的物流物件文字描述資訊與HScode之間的對應關係;
具體實現時,可以收集物流物件歷史歸類記錄中的有標注資料,例如,可以包括《中華人民共和國進出口稅則》、歷史通關資料、專家標注資料等。
例如,《中華人民共和國進出口稅則》中記錄的資訊可以如表1所示(僅示出一條):
表1
當然,由於具體稅則中記錄的商品描述通常並不特指某一件商品,因此,為了更好的補充上述稅則中的資訊,還可以以跨境網路銷售系統中的歷史通關資料作為補充。例如,某條歷史通關資料可以如表2所示:
表2
也就是說,歷史通關資料中記錄的是具體物流物件的名稱等與HScode之間的對應關係,因此,將這種資料納入到具體的訓練樣本中,可以更有利於為具體物流物件預測出更準確的HScode。
另外,除了上述稅則以及歷史通關資料之外,還可以以專家標注資料作為補充,例如,其中一條專家標注資料可以如表3所示:
表3
總之可以通過多種途徑進行訓練樣本的採集。當然,由於這些採集到的資料通常是一些歷史資料,而在實際應用中,可能會涉及到一些HScode的變更,或者停用,分拆等情況,因此,使得一些歷史資料中對於後續的分類而言可能已經是無效的資訊。為此,在優選的實施方式中,還可以對所述訓練樣本進行資料清洗,以便利用剩餘的有效訓練樣本進行分類模型的訓練。
具體的資料清洗過程就是可以包括對發生變更的HScode進行修改,停用的HScode對應的訓練樣本進行刪除,對拆分的HScode對應的訓練樣本中的HScode進行重新判定,等等。其中,具體實現時,首先可以預先保存新舊HScode映射關係資訊;這樣,在收集到具體的訓練樣本之後,可以首先對各條訓練樣本進行遍歷,判斷各條樣本中的HScode中是否出現了舊的HScode。對於出現舊HScode的訓練樣本,可以根據所述映射關係替換為新HScode後,再作為有效訓練樣本加入到訓練樣本集合中。例如,對於某類物流物件,以前的稅則中定義的HScode是6110110000,後來經過稅則修改之後,將該類物流物件的HScode修改為6110110011,因此,就可以對這種映射關係進行保存。在採集到訓練樣本之後,如果發現某條訓練樣本中包含的HScode是6110110000,則可以根據保存的映射關係,將其修改為6110110011,然後就可以使得該條訓練樣本成為有效的資料。
另外,還可以預先保存已停用的HScode名單;這樣,在採集到訓練樣本之後,同樣可以對各條訓練樣本中的HScode進行遍歷,將出現所述已停用HScode的訓練樣本刪除。例如,某HScode為6110110027,後來經過稅則修改之後,將該HScode對應的類別刪除,相應的,該HScode停用。因此,可以對這種HScode進行記錄,採集到具體的訓練樣本之後,如果發現某條訓練樣本中的HScode為6110110027,則可以將該HScode對應的訓練樣本刪除。
再者,還可能存在HScode拆分的情況,例如,在舊版本的稅則中,某類別對應的HScode是6110110000,經過稅則改版之後,將該類別進行細化拆分成了兩個子類,分別對應HScode6110110001,以及6110110002,原來的HScode6110110000不再使用。因此,還可以預先保存分拆HScode資訊名單,其中,每條分拆HScode資訊中包括拆分前的HScode,以及對應的拆分後的多個HScode;這樣,在採集到具體的訓練資料之後,同樣可以對各條資料中的HScode進行遍歷,將出現所述拆分前的HScode的訓練樣本提取出來,以便重新判定分拆後的HScode後,再作為有效訓練樣本加入到訓練樣本集合中。例如,發現其中某條訓練樣本中的HScode是6110110000,則可以將其提取出來。之後,可以通過專家確認等方式,重新為該條訓練樣本判定分拆後的HScode,然後對分拆前的HScode進行替換,使得該條訓練樣本成為有效資料,等等。
另外,除了對訓練樣本進行上述資料清洗之外,還可以通過對訓練樣本進行隨機採樣等方式進行人工校驗,儘量提升訓練樣本的品質,以提升最終訓練出的模型的準確度。
步驟二:對所述訓練樣本中的文字描述資訊進行分詞處理,並過濾掉無效詞彙,得到特徵詞;
在對訓練樣本進行了資料清洗等處理之後,可以進行下一步的處理。具體的,由於訓練樣本中存在資料物件的文字描述資訊,這種文字描述資訊可以是具體物流物件的標題,或者,也可以是稅則中給出的文字描述,商家報關時的申報要素,等等。總之,每條訓練樣本中都記錄有文字描述資訊與HScode之間的對應關係。機器學習的目的是從同一個HScode對應的多條文字描述資訊中,找出規律性的資訊,以用於進行HScode的預測。具體在對文字描述資訊進行處理時,首先可以包括分詞處理,也就是,文字描述資訊通常是一句話,或者一段話,分詞的目的是將其劃分成多個詞彙。
例如,某條訓練樣本中的文字描述資訊是:春秋新款羊毛開衫女披肩外套薄針織衫短款V領小開衫寬鬆大碼毛衣。進行分詞處理後得到的分詞結果可以是:春秋/新款/羊毛/開衫/女/披肩/外套/薄/針織衫/短款/V領/小開衫/寬鬆/大碼/毛衣。其中,具體的分詞處理方式可以參見現有技術中的方案,這裡不再贅述。
在完成分詞之後,還可以將其中與分類無關的詞彙過濾掉,僅留下有效的特徵詞。其中,為了達到該目的,還可以對所述文字描述資訊的分詞結果得到的詞彙進行命名實體識別,根據命名實體識別結果,過濾掉與物流物件分類無關的詞彙。例如,同樣假設某條訓練樣本的文字描述資訊是:春秋新款羊毛開衫女披肩外套薄針織衫短款V領小開衫寬鬆大碼毛衣,則分詞並進行命名實體識別後,得到的結果可以是:
春秋[季節]/新款[導購詞]/羊毛[材質]/開衫[品類]/女[人群]/披肩[款式]/外套[品類]/薄[款式]/針織[織造方法]/短款[款式]/V領[款式]/小開衫[品類]/寬鬆[款式]/大碼[風格]/毛衣[品類]
經過上述命名實體識別後,可以去除掉其中與歸類無關的詞,比如季節、導購、風格等詞,留下與歸類有關的品類、材質、織造方法等詞,方便後續的特徵處理。由於這種被留下的詞更能體現出具體的物流物件在HScode歸類時的特徵,因此,可以稱為特徵詞。
步驟三:將各條訓練樣本中得到的特徵詞進行匯總及去重處理,得到特徵詞集合,並分別為各個特徵詞分配對應的序號;
得到特徵詞後,可以對各條訓練樣本中的特徵詞進行匯總以及去重處理,得到特徵詞集合,並且,為了方便以向量的方式表達各條訓練樣本中的文字描述資訊,使得後續可以通過向量計算的方式進行機率計算,還可以分別為各個特徵詞分配對應的序號。例如,假設各條訓練樣本中的特徵詞彙集在一起,一共有一萬個特徵詞,則可以分別將這些特徵詞從1到10000進行編號。這樣,對於每條訓練樣本,只要根據在各序號上的特徵詞包含情況,產生對應的特徵詞向量即可。
步驟四:根據各條訓練樣本中對各序號上的特徵詞的包含情況,產生各條訓練樣本對應的特徵詞向量;
如步驟三所述,在產生了特徵詞集合,並獲得各自的序號之後,在對每條訓練樣本中的文字描述資訊進行表達時,都可以根據各訓練樣本對各序號上的特徵詞的包含情況,產生各條訓練樣本對應的特徵詞向量。也就是說,假設共有一萬個特徵詞,則每條訓練樣本都可以對應一萬維度的特徵詞向量。而由於上述特徵詞是根據各條訓練樣本進行分詞、過濾後匯總得到的,因此,每條訓練樣本中包含的特徵詞,一定會存在於該特徵詞集合中。也即,每條訓練樣本中包含的特徵詞,是上述特徵詞集合的一個子集。其中,對於某條訓練樣本而言,其特徵詞向量中各個元素的取值,可以根據在對應序號上是否存在特徵詞來判定。例如,某條訓練樣本中包含的特徵詞分別為1號、12號、23號、25號、68號、1279號,等等,則該訓練樣本的特徵詞向量中,可以在上述序號對應的元素值為1,其他序號上的元素值為0,以此來表達該訓練樣本中包含哪些特徵詞。或者,在另一種實現方式下,還可以根據具體特徵詞的屬性等資訊,為各個序號上的元素值賦予初始權重,如果訓練樣本中存在某序號上的特徵詞,則可以將該序號上對應的元素值設為該特徵詞的初始權重,代表該特徵詞對應HScode的商品類別進行歸類的重要程度。例如,產生的特徵詞向量可以為{1:0.2,4:0.5,12:0.6,1009:0.3,3801:0.2……},也即,該訓練樣本中包含1號特徵詞、4號特徵詞、12號特徵詞、1009號特徵詞、3801號特徵詞,等等,各自對應的初始權重分別為0.2,0.5,0.6,0.3,0.2,等等。需要說明的是,上述例子中,由於訓練樣本中不存在其他序號(例如,2,3,5,6……)的特徵,因此,對應的元素值為0,上述向量中未示出。而在具體實現時,為了便於進行向量之間的乘法運算等,位於為0的元素值也是存在於具體的向量中的。
當然,在具體實現時,每條訓練樣本都對應一萬維度甚至更大維度的向量,進行計算時,可能會存在佔用比較大的計算資源的情況,而由於每條訓練樣本中所包含的特徵詞數量,相對於向量的總維度數而言,通常是非常少的,因此,使得向量中大部分的元素值為0,因此,可能會造成對計算資源的浪費。為此,在可選的實施方式中,還可以預先對HScode進行分組,例如,某些HScode對應的商品類別具有比較強的相似性,因此,可以分成一組,組成一個大類,等等。其中,對HScode進行分組的分組依據還可以是網路銷售系統中定義的類目體系等資訊,這樣,可以使得網路銷售系統中定義的類目體系,與這種海關HScode之間產生關聯,也便於後續進行預測時進行更高效的分類預測。
例如,網路銷售系統中定義的類目體系中包括服裝、日用品、家電、電腦耗材等一級類目,各一級類目下還包括多個二級類目,二級類目下還可以包括三級類目,等等,最後到葉子類目。在對HScode進行分組時,可以根據具體類目體系中的某一級類目為依據進行分組,根據不同的類目級別,分成的HScode的組別數量會不同,每一組中包含的HScode數量也會不同。具體可以根據實際需求來進行選擇。
通過上述方式進行分組之後,可以分別以各組別為單位,進行分類模型的訓練,這樣,每個組別內部的訓練樣本數量會有所減少,因此,對應的特徵詞總量也會減少,最終每條訓練樣本對應的特徵詞向量的維度也會降低,從而降低計算量,提高訓練效率。
步驟五:分別將同一HScode關聯的多條訓練樣本對應的特徵詞向量輸入到預置的機器學習模型中進行訓練,得到各HScode分別對應的分類模型。
在得到各條訓練樣本的特徵詞向量後,便可以將同一HScode關聯的多條訓練樣本對應的特徵詞向量輸入到預置的機器學習模型中進行訓練,也就是,假設訓練樣本中,對應某一個HScode的訓練樣本一共有1000條,則可以將該1000條訓練樣本分別對應的特徵向量輸入到機器學習模型中進行訓練。其中,具體的機器學習模型可以有多種,例如,可以包含但不限於SVM、LR、樸素貝葉斯、最大熵等分類模型,以及lstm+softmax等深度學習方法,等等。經過多輪反覆運算直到演算法收斂後,可以得到該HScode對應的分類模型。該分類模型同樣可以由一個向量來表示,例如,{f1:w1,f2:w2,f3:w3,f4:w4,f5:w5,f6:w6…},其中,fn代表具體特徵詞的序號,wn代表對應的權重。也即,對於某個HScode而言,其訓練結果用於表達,各序號對應的特徵詞對於該HScode的重要程度。
總之,經過機器學習訓練之後,每個HScode都可以分別對應一個特徵詞權重向量,各自的特徵詞權重向量中,相同序號上的特徵詞對應的權重可能是不同的。訓練得到的分類模型可以持久化儲存到磁片等儲存媒體中,或者,也可以如前文所述,根據該模型產生介面化的分類工具,提供給各種使用者來使用。
當然,除了上述邏輯回歸模型,還可以採用決策樹模型、神經網路模型等其他類型的分類模型。例如,對於決策樹模型,可以是基於詞特徵在多棵樹模型做決策過程,基於每棵樹保存著的分裂的臨限值以及特徵詞向量的特徵,決定該物流物件屬於每棵樹的哪一個葉子節點,從而決定該物流物件被歸類於每個潛在HScode等編碼對應類別的機率。對於神經網路模型,具體的編碼分類模型可以擁有多層的非線性變化單元,每一層的判非線性變化單元與下一層非線性變化單元串聯,每一層非線性變化單元保存有基於特徵詞向量或由特徵詞向量衍生特徵向量的特徵權重,通過多層非線性變化單元的相互作用得到物流物件被歸類到每個編碼對應類別的機率。關於更具體的細節,這裡不再詳述。
另外,對於除了HScode之外的其他編碼,也可以採用類似的方式獲得編碼分類模型。
上述建立編碼分類模型的過程可以是提前完成的,完成之後,便可以利用具體的模型對待歸類的目標物流物件進行歸類。具體的,參見圖3,可以包括以下步驟:
S301:判定待歸類的目標物流物件的文字描述資訊並對所述文字描述資訊進行處理,判定包含的目標特徵詞;
從該步驟開始,主要就是利用上述編碼分類模型對具體的目標物流物件的編碼進行預測的過程。具體的,首先可以判定出待歸類的目標物流物件的文字描述資訊,其中,具體的文字描述資訊可以從物流物件的標題等資訊中獲得。具體實現時,如果是提供介面化的工具,如圖4所示,還可以在介面中提供用於輸入目標物流物件文字描述資訊的入口,例如,可以是輸入框等。或者,還可以提供用於批量導入多個目標物流物件的文字描述資訊的入口,這樣,使用者便可以預先通過Excel表格等方式對需要進行分類的物流物件的文字描述資訊進行整理,可以按照預先規定的欄位名稱等對表格中的資料列進行命名。之後,便可以通過上述批量操作入口,將該表格中記錄的各條物流物件的文字描述資訊導入到工具中,等等。其中,無論是單條輸入目標物流物件的文字描述資訊,還是批量導入,具體的目標物流物件都可以是等待進行報關的物流物件,例如,可以是從具體的跨境訂單中提取出的目標物流物件的標題等文字物件資訊,等等。
S302:根據所述文字描述資訊對各目標特徵詞的包含情況,產生該目標物流物件對應的特徵詞向量;
在獲取到目標資料物件的文字描述資訊中,便可以對文字描述資訊進行處理,具體的處理方式,與對訓練樣本中文字描述資訊的處理方式可以是一致的。例如,同樣可以進行分詞處理,過濾掉其中的無效詞彙,將剩餘的有效詞彙判定為目標特徵詞。然後,同樣可以根據文字描述資訊對各目標特徵詞的包含情況,產生該目標物流物件對應的特徵詞向量。具體的,可以是根據所述目標物流物件的文字描述資訊中對各序號上的特徵詞的包含情況,產生該目標物流物件對應的特徵詞向量。例如,目標物流物件的文字描述資訊中包括1號特徵詞,5號特徵詞,8號特徵詞,27號特徵詞,等等,則上述各序號對應的元素值可以為1,或者預置的初始權重值,其他序號對應的元素值則為0。當然,在實際應用中,對於待預測的目標物流物件的文字描述資訊,其中可能包括有在訓練過程中未收錄過的詞彙,對於這種詞彙,可以將其過濾掉,不必輸入到具體的分類模型中。但是在完成此次預測之後,還可以根據該詞彙的命名實體資訊等,判定其是否與HScode分類相關,如果相關,則還可以作為特徵詞加入到對應的特徵詞集合中,並且可以重新對模型進行訓練,等等。
這裡需要說明的是,這裡針對目標物流物件產生的特徵詞向量的維度,與訓練時的特徵詞集合中特徵詞的數量是一致的。例如,如果在訓練時,將所有的訓練樣本中的特徵詞彙集在一起組成特徵詞集合,其中包括的特徵詞數量為N,則需要預測的目標物流物件對應的特徵詞向量也可以是N維向量。另外,如果在訓練時,對HScode進行了分組,每個組內的HScode對應的訓練樣本中特徵詞進行匯總,則每個組內的特徵詞數量也會有所減少。在這種情況下,具體在針對目標物流物件產生特徵詞向量之前,還可以首先判定出該目標物流物件所屬的組別,例如,如果是按照某網路銷售系統中的類目體系對HScode進行的分組,則可以根據目標物流物件在該網路銷售系統中的類目體系下所屬的類目,判定出對應的HScode組別。進而,利用該HScode組別中包含的特徵詞集合,來判定當前目標物流物件的特徵詞向量即可。
S303:將所述特徵詞向量輸入到編碼分類模型中,獲取對應的分類特徵資訊。
在判定出目標物流物件對應的特徵詞向量之後,便可以輸入到編碼分類模型中,獲取具體的分類特徵資訊。例如,具體實現時,對於採用邏輯回歸模型進行HScode歸類的情況,可以將所述特徵詞向量輸入到所述編碼分類模型中,判定所述目標物流物件屬於各HScode對應類別的機率,另外,還可以根據所述機率提供分類建議資訊。具體的,可以將目標物流物件的特徵詞向量分別與各HScode對應的特徵詞權重向量進行乘法運算(還可能會通過某偏置值等進行調節),從而得到該目標物流物件屬於各HScode對應類別的機率值。其中,如果在訓練時,是進行了分組訓練,則在將特徵詞向量輸入到所述分類模型中的同時,還可以將該目標物流物件對應的組別資訊,輸入到分類模型中。這樣,只需要將該目標物流物件的特徵詞向量,分別與該組別內的各HScode對應的特徵詞權重向量進行運算即可,而不需要對全部的HScode分別進行機率計算,從而可以節省計算量。
在計算得到目標物流物件屬於各HScode對應類別的機率之後,還可以返回對應的分類建議資訊。例如,可以將機率高於預置臨限值的一個或者幾個HScode進行返回,這樣使用者可以根據這種建議結果為目標物流物件判定具體的HScode。
總之,通過本申請實施例,可以預先判定編碼分類模型,這樣,針對待歸類的目標物流物件,可以獲取其文字描述資訊並對所述文字描述資訊進行處理,判定包含的目標特徵詞,然後,根據所述文字描述資訊對各目標特徵詞的包含情況,產生該目標物流物件對應的特徵詞向量,之後,便可以將所述特徵詞向量輸入到編碼分類模型中,獲取對應的分類特徵資訊。通過這種方式,可以實現對物流物件的自動分類,而不再需要依賴於人工分類,因此,可以提高效率以及準確度。
在可選的實施例中,通過對訓練樣本的收集、處理以及機器學習訓練,可以得到針對各個HScode的分類模型,具體可以通過特徵詞權重向量來表示,其中記錄有各個特徵詞對關聯HScode的判別權重值。這樣,具體在對某個目標資料物件進行預測時,就可以將該目標資料物件的文字描述資訊進行分詞等處理,判定出其中包含的特徵詞,並產生特徵詞向量。這樣就可以將這種特徵詞向量輸入到之前訓練號的分類模型中,從而可以計算出該目標物流物件被歸類為各個HScode的機率,並可以據此給出建議資訊,例如,可以給出建議的一個或者幾個HScode,等等。通過這種方式,使得對目標資料物件進行分類的過程不再完全依賴於專家,可以降低人力成本,並且,分類的效率也得到提升,不會受限於專家的經驗以及個人能力。
實施例二
該實施例二提供了一種產生編碼分類模型的方法,參見圖5,該方法具體可以包括:
S501:收集訓練樣本,其中,每條訓練樣本中包括已知的物流物件文字描述資訊與編碼之間的對應關係;
其中,所述編碼具體就可以是指前文所述的海關編碼HScode等。
S502:對所述訓練樣本中的文字描述資訊進行分詞處理,並過濾掉無效詞彙,得到特徵詞;
S503:將各條訓練樣本中得到的特徵詞進行匯總及去重處理,得到特徵詞集合,並分別為各個特徵詞分配對應的序號;
S504:根據各條訓練樣本中對各序號上的特徵詞的包含情況,產生各條訓練樣本對應的特徵詞向量;
S505:分別將同一編碼關聯的多條訓練樣本對應的特徵詞向量輸入到預置的機器學習模型中進行訓練,得到各編碼分別對應的分類模型。
關於該實施例二中的未詳述部分,可以參見前述實施例一中的記載,這裡不再贅述。
與實施例一相對應,本申請實施例還提供了一種物流物件資訊處理裝置,參見圖6,該裝置具體可以包括:
目標物流物件資訊判定單元601,用於判定待歸類的目標物流物件的文字描述資訊並對所述文字描述資訊進行處理,判定包含的目標特徵詞;
特徵向量產生單元602,用於判定待歸類的目標物流物件的文字描述資訊並對所述文字描述資訊進行處理,判定包含的目標特徵詞;
分類特徵資訊獲取單元603,用於將所述特徵詞向量輸入到編碼分類模型中,獲取對應的分類特徵資訊。
其中,所述編碼分類模型包括邏輯回歸模型、決策樹模型、神經網路模型。
若所述編碼分類模型為邏輯回歸模型,則所述編碼分類模型保存有每個編碼對應的特徵詞權重向量。
具體的,所述編碼包海關編碼HScode,所述編碼分類模型中保存有每個海關編碼HScode對應的特徵詞權重向量;所述特徵詞權重向量中記錄有各個特徵詞對關聯HScode的判別權重值。
若所述編碼分類模型為決策樹模型,則所述編碼分類模型保存有多棵樹模型,以及基於每棵樹保存有分裂的臨限值以及特徵詞向量的特徵,以便判定所述目標物流物件被歸類於每個潛在編碼對應類別的機率。
若所述編碼分類模型為神經網路模型,則所述編碼分類模型具有多層的非線性變化單元,每一層的判非線性變化單元與下一層非線性變化單元串聯,每一層非線性變化單元保存有基於特徵詞向量或由特徵詞向量衍生特徵向量的特徵權重,以便通過多層非線性變化單元的相互作用得到物流物件被歸類於每個潛在編碼對應類別的機率。
具體實現時,所述分類特徵資訊獲取單元具體可以用於將所述特徵詞向量輸入到編碼分類模型中,判定所述目標物流物件被歸類於各潛在編碼對應類別的機率。還可以用於根據所述機率提供分類建議資訊。
所述編碼分類模型通過以下方式建立:
樣本收集單元,用於收集訓練樣本,其中,每條訓練樣本中包括已知的物流物件文字描述資訊與HScode之間的對應關係;
特徵詞判定單元,用於對所述訓練樣本中的文字描述資訊進行分詞處理,並過濾掉無效詞彙,得到特徵詞;
特徵詞彙總單元,用於將各條訓練樣本中得到的特徵詞進行匯總及去重處理,得到特徵詞集合,並分別為各個特徵詞分配對應的序號;
特徵詞向量產生單元,用於根據各條訓練樣本中對各序號上的特徵詞的包含情況,產生各條訓練樣本對應的特徵詞向量;
訓練單元,用於分別將同一HScode關聯的多條訓練樣本對應的特徵詞向量輸入到預置的機器學習模型中進行訓練,得到各HScode分別對應的分類模型。
具體實現時,所述特徵向量產生單元具體可以用於,根據所述目標物流物件的文字描述資訊中對各序號上的特徵詞的包含情況,產生該目標物流物件對應的特徵詞向量。
其中,在進行模型訓練時,該裝置還可以包括:
資料清洗單元,用於所述收集訓練樣本之後,對所述訓練樣本進行資料清洗,以便利用剩餘的有效訓練樣本進行分類模型的訓練。
具體的,所述資料清洗單元具體可以用於:
預先保存新舊HScode映射關係資訊;對於出現舊HScode的訓練樣本,根據所述映射關係替換為新HScode後,再作為有效訓練樣本加入到訓練樣本集合中。
或者,資料清洗單元也可以用於:
預先保存已停用的HScode名單;將出現所述已停用HScode的訓練樣本刪除。
再或者,資料清洗單元也可以用於:
預先保存分拆HScode資訊名單,其中,每條分拆HScode資訊中包括拆分前的HScode,以及對應的拆分後的多個HScode;
將出現所述拆分前的HScode的訓練樣本提取出來,以便重新判定分拆後的HScode後,再作為有效訓練樣本加入到訓練樣本集合中。
具體實現時,該裝置還可以包括:
詞彙過濾單元,用於對所述文字描述資訊的分詞結果得到的詞彙進行命名實體識別,根據命名實體識別結果,過濾到與物流物件分類無關的詞彙。
另外,具體在進行模型訓練時,該裝置還可以包括:
分組單元,用於所述將各條訓練樣本中得到的特徵詞進行匯總及去重處理之前,根據相關網路銷售系統中的類目體系下其中一個級別的類目資訊,將所述HScode進行分組,得到多個組別,每個組別下包括多個HScode,以便以各個HScode組別為單位,進行所述特徵詞進行匯總去重,以及產生特徵向量、模型訓練處理。
具體的,所述分類模型中還可以保存有各組別與HScode之間的對應關係;
在進行預測時,所述裝置還可以包括:
組別判定單元,用於根據所述目標物流物件在所述網路銷售系統類目體系下所屬的類目,為其判定對應的HScode組別;
所述預測單元具體可以用於:
將所述目標物流物件對應的HScode組別以及所述特徵詞向量輸入到所述分類模型中,以便判定所述目標物流物件屬於所述組別下各HScode對應類別的機率。
與實施例二相對應,本申請實施例還提供了一種產生海關編碼分類模型的裝置,參見圖7,該裝置具體可以包括:
樣本收集單元701,用於收集訓練樣本,其中,每條訓練樣本中包括已知的物流物件文字描述資訊與編碼之間的對應關係;
特徵詞判定單元702,用於對所述訓練樣本中的文字描述資訊進行分詞處理,並過濾掉無效詞彙,得到特徵詞;
特徵詞彙總單元703,用於將各條訓練樣本中得到的特徵詞進行匯總及去重處理,得到特徵詞集合,並分別為各個特徵詞分配對應的序號;
特徵詞向量產生單元704,用於根據各條訓練樣本中對各序號上的特徵詞的包含情況,產生各條訓練樣本對應的特徵詞向量;
訓練單元705,用於分別將同一編碼關聯的多條訓練樣本對應的特徵詞向量輸入到預置的機器學習模型中進行訓練,得到各編碼分別對應的編碼分類模型;所述編碼分類模型中保存有每個編碼對應的特徵詞權重向量;所述特徵詞權重向量中記錄有各個特徵詞對關聯編碼的判別權重值。
另外,對應於本申請實施例一,本申請實施例還提供了一種電腦系統,包括:
一個或多個處理器;以及
與所述一個或多個處理器關聯的記憶體,所述記憶體用於儲存程式指令,所述程式指令在被所述一個或多個處理器讀取執行時,執行如下操作:
判定待歸類的目標物流物件的文字描述資訊並對所述文字描述資訊進行處理,判定包含的目標特徵詞;
根據所述文字描述資訊對各目標特徵詞的包含情況,產生該目標物流物件對應的特徵詞向量;
將所述特徵詞向量輸入到編碼分類模型中,獲取對應的分類特徵資訊。
其中,圖8示例性的展示出了電腦系統的架構,具體可以包括處理器810、視訊顯示卡811、磁碟機812、輸入/輸出介面813、網路介面814,以及記憶體820。上述處理器810、視訊顯示卡811、磁碟機812、輸入/輸出介面813、網路介面814,與記憶體820之間可以通過通信匯流排830進行通信連接。
其中,處理器810可以採用通用的CPU(Central Processing Unit,中央處理器)、微處理器、應用專用積體電路(Application Specific Integrated Circuit,ASIC)、或者一個或多個積體電路等方式實現,用於執行相關程式,以實現本申請所提供的技術方案。
記憶體820可以採用ROM(Read Only Memory,唯讀記憶體)、RAM(Random Access Memory,隨機存取記憶體)、靜態儲存裝置、動態儲存裝置設備等形式實現。記憶體820可以儲存用於控制電腦系統800運行的作業系統821,用於控制電腦系統800的低級別操作的基本輸入輸出系統(BIOS)822。另外,還可以儲存網頁瀏覽器823,資料儲存管理系統824,以及分類處理系統825等等。上述分類處理系統825就可以是本申請實施例中具體實現前述各步驟操作的應用程式。總之,在通過軟體或者韌體來實現本申請所提供的技術方案時,相關的程式碼保存在記憶體820中,並由處理器810來調用執行。
輸入/輸出介面813用於連接輸入/輸出模組,以實現資訊輸入及輸出。輸入輸出/模組可以作為元件配置在設備中(圖中未示出),也可以外接於設備以提供相應功能。其中輸入裝置可以包括鍵盤、滑鼠、觸控式螢幕、麥克風、各類感測器等,輸出設備可以包括顯示器、揚聲器、振動器、指示燈等。
網路介面814用於連接通信模組(圖中未示出),以實現本設備與其他設備的通信互動。其中通信模組可以通過有線方式(例如USB、網線等)實現通信,也可以通過無線方式(例如行動網路、WIFI、藍牙等)實現通信。
匯流排830包括一通路,在設備的各個元件(例如處理器810、視訊顯示卡811、磁碟機812、輸入/輸出介面813、網路介面814,與記憶體820)之間傳輸資訊。
另外,該電腦系統800還可以從虛擬資源物件領取條件資訊資料庫841中獲得具體領取條件的資訊,以用於進行條件判斷,等等。
需要說明的是,儘管上述設備僅示出了處理器810、視訊顯示卡811、磁碟機812、輸入/輸出介面813、網路介面814,記憶體820,匯流排830等,但是在具體實施過程中,該設備還可以包括實現正常運行所必需的其他元件。此外,本領域的技術人員可以理解的是,上述設備中也可以僅包含實現本申請方案所必需的元件,而不必包含圖中所示的全部元件。
通過以上的實施方式的描述可知,本領域的技術人員可以清楚地瞭解到本申請可借助軟體加必需的通用硬體平臺的方式來實現。基於這樣的理解,本申請的技術方案本質上或者說對現有技術做出貢獻的部分可以以軟體產品的形式體現出來,該電腦軟體產品可以儲存在儲存媒體中,如ROM/RAM、磁碟、光碟等,包括若干指令用以使得一台電腦設備(可以是個人電腦、伺服器,或者網路設備等)執行本申請各個實施例或者實施例的某些部分所述的方法。
本說明書中的各個實施例均採用遞進的方式描述,各個實施例之間相同相似的部分互相參見即可,每個實施例重點說明的都是與其他實施例的不同之處。尤其,對於系統或系統實施例而言,由於其基本相似於方法實施例,所以描述得比較簡單,相關之處參見方法實施例的部分說明即可。以上所描述的系統及系統實施例僅僅是示意性的,其中所述作為分離構件說明的單元可以是或者也可以不是物理上分開的,作為單元顯示的構件可以是或者也可以不是物理單元,即可以位於一個地方,或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部模組來實現本實施例方案的目的。本領域普通技術人員在不付出創造性勞動的情況下,即可以理解並實施。
以上對本申請所提供的物流物件資訊處理方法、裝置及電腦系統,進行了詳細介紹,本文中應用了具體個例對本申請的原理及實施方式進行了闡述,以上實施例的說明只是用於幫助理解本申請的方法及其核心思想;同時,對於本領域的一般技術人員,依據本申請的思想,在具體實施方式及應用範圍上均會有改變之處。綜上所述,本說明書內容不應理解為對本申請的限制。The technical solutions in the embodiments of the present application will be described clearly and completely in combination with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art fall within the protection scope of the present application. In the embodiments of the present application, in order to improve the logistics object classification efficiency and reduce the labor cost, a coding classification model may be established in advance, for example, it may specifically include a logistic regression model, a decision tree model, a neural network model, and so on. Among them, if the coding classification model is a logistic regression model, you can establish a coding classification model through machine learning, and then automatically classify logistics objects (specifically, commodity objects, etc.) through the coding classification model to determine the corresponding coding implementation. . Specifically, as shown in FIG. 1, some training samples may be collected, which may specifically be the correspondence between known text description information of commodity objects and encodings such as HScode. Then, based on the characteristics of the training samples and training targets, the The data is processed and input into a specific machine learning model for training, and finally a specific coding classification model is established. After that, this code classification model can be used to predict the code to which a specific logistics item belongs. The specific prediction result can be directly used as the code classification result, or can also be used as a reference for the code classification result, and so on. In specific implementation, after the coding classification model is established, it can be provided to merchant users, customs declaration partners (CP) of cross-border online sales systems, customs departments, etc. in the customs clearance process of logistics objects, so as to replace or Partially replace the traditional manual classification, improve the efficiency of customs clearance of logistics items, and reduce the cost of enterprise classification. In addition, for the convenience of users, the technical threshold for using the above model is lowered. As shown in Figure 2, it can be further developed into an interface-based classification tool (which can be an online tool, or it can be installed into a local application, etc.) based on the coding classification model. Control items such as the input box provided in the interface, enter the text description information of the target logistics object to be classified, the classification tool can automatically process, and call the pre-configured coding classification model to give the final classification suggestions. The specific implementation scheme is described in detail below. Embodiment 1 First, Embodiment 1 of the present application provides a method for information processing of logistics objects from the perspective of the aforementioned classification tool. In this method, first, a coding classification model can be obtained, for example, it can specifically include a logistic regression model, a decision Tree model, neural network model, etc. Among them, for the logistic regression model, the specific code classification model may store the feature word weight vector corresponding to each code. For example, for the customs code HScode, the characteristic word weight vector corresponding to each HScode may be specifically stored in the coding classification model; the characteristic word weight vector records the discrimination weight value of each characteristic word for the associated HScode. In the case of the above logistic regression model, for HScode, the specific coding classification model may be pre-built. In one of the specific implementations, the step of establishing the model may first include: Step 1: Collect training samples, where each The training sample includes the known correspondence between the text description information of the logistics objects and the HScode; in specific implementation, the marked data in the historical classification records of the logistics objects can be collected, for example, it can include the "People's Republic of China Import and Export Tariff" , Historical customs clearance data, expert labeling data, etc. For example, the information recorded in the "People's Republic of China Import and Export Tariff" can be shown in Table 1 (only one is shown): Table 1 Of course, because the product descriptions recorded in specific tariffs do not usually refer to a particular product, in order to better supplement the information in the above tariffs, historical customs clearance data in the cross-border online sales system can also be used as supplements. For example, a piece of historical customs clearance data can be shown in Table 2: Table 2 In other words, the historical customs clearance data records the correspondence between the names of specific logistics objects and HScode. Therefore, incorporating such data into specific training samples can be more conducive to predicting more specific logistics objects. Accurate HScode. In addition to the above tariffs and historical customs clearance information, it can also be supplemented by expert labeling data. For example, one of the expert labeling data can be shown in Table 3: Table 3 In short, training samples can be collected through multiple channels. Of course, because the collected data is usually some historical data, in actual application, it may involve some HScode changes, or suspension, spin-off, etc., so that some historical data for subsequent classification and Words may already be invalid information. For this reason, in a preferred embodiment, the training samples may also be cleaned in order to use the remaining valid training samples to train the classification model. The specific data cleaning process may include modifying the changed HScode, deleting the training samples corresponding to the deactivated HScode, re-determining the HScode in the training samples corresponding to the split HScode, and so on. Among them, in the specific implementation, you can first save the new and old HScode mapping relationship information; in this way, after collecting the specific training samples, you can first traverse each training sample to determine whether the old HScode in each sample appears HScode. For the training samples where the old HScode appears, the new HScode can be replaced according to the mapping relationship, and then added to the training sample set as an effective training sample. For example, for a certain type of logistics object, the HScode defined in the previous tariff code is 6110110000. After the tariff code is modified later, the HScode of the logistics object is modified to 6110110011. Therefore, this mapping relationship can be saved. After the training sample is collected, if the HScode contained in a training sample is found to be 6110110000, it can be modified to 6110110011 according to the saved mapping relationship, and then the training sample can be made effective data. In addition, a list of disabled HScodes can also be saved in advance; in this way, after the training samples are collected, the HScodes in each training sample can also be traversed, and the training samples in which the disabled HScodes appear will be deleted. For example, a certain HScode is 6110110027, and later after the tax code is revised, the corresponding category of the HScode is deleted, and accordingly, the HScode is deactivated. Therefore, this HScode can be recorded, and after a specific training sample is collected, if the HScode in a training sample is found to be 6110110027, the training sample corresponding to the HScode can be deleted. In addition, there may be cases where HScode is split. For example, in the old version of the tax code, the HScode corresponding to a category is 6110110000. After the tax code is revised, the category is divided into two subcategories, respectively. Corresponding to HScode6110110001, and6110110002, the original HScode6110110000 is no longer used. Therefore, a list of split HScode information can also be saved in advance, where each split HScode information includes the HScode before the split and the corresponding multiple HScodes after the split; in this way, after collecting specific training data, It is also possible to traverse the HScode in each piece of data, extract the training samples of the HScode before the split, so as to re-determine the split HScode, and then add it to the training sample set as an effective training sample. For example, if the HScode in one of the training samples is 6110110000, it can be extracted. After that, you can re-determine the split HScode for the training sample through expert confirmation and other methods, and then replace the HScode before the split to make the training sample a valid data, and so on. In addition, in addition to the above data cleaning on the training sample, manual verification can also be performed by randomly sampling the training sample to improve the quality of the training sample as much as possible to improve the accuracy of the final trained model. Step 2: Perform word segmentation on the text description information in the training sample, and filter out invalid words to obtain feature words; After performing data cleaning and other processing on the training sample, the next step of processing can be performed. Specifically, because there is text description information of the data object in the training sample, this text description information can be the title of the specific logistics object, or it can also be the text description given in the tariff, the declaration element when the merchant declares the customs, etc. . In short, the correspondence between text description information and HScode is recorded in each training sample. The purpose of machine learning is to find regular information from multiple pieces of text description information corresponding to the same HScode, which can be used to predict HScode. Specifically, when processing text description information, it can first include word segmentation processing, that is, text description information is usually a sentence, or a paragraph, and the purpose of word segmentation is to divide it into multiple words. For example, the text description information in a training sample is: Spring and Autumn new wool cardigan women shawl jacket thin knitted sweater short V-neck small cardigan loose plus size sweater. The word segmentation results obtained after the word segmentation process can be: Spring/Autumn/New/Wool/Cardigan/Female/Shawl/Coat/Thin/Knitwear/Short/V-neck/Small Cardigan/Loose/Large Size/Sweater. The specific word segmentation processing method may refer to the solution in the prior art, and will not be repeated here. After word segmentation is completed, vocabulary words that are not related to classification can also be filtered out, leaving only valid feature words. In order to achieve this purpose, the vocabulary obtained by the word segmentation result of the text description information may also be subjected to named entity recognition, and according to the recognition result of the named entity, vocabulary that is not related to the classification of logistics objects may be filtered out. For example, assuming that the text description information of a training sample is: Spring and Autumn new wool cardigan women shawl jacket thin sweater short V-neck small cardigan loose large size sweater, then after segmenting and performing named entity recognition, the result can be: Spring and Autumn [season]/new style[shopping word]/wool[material]/cardigan[category]/female[crowd]/shawl[style]/coat[category]/thin[style]/knitting[weaving method]/short style[ Style]/V-neck [style]/small cardigan [category]/loose [style]/oversize [style]/sweater [category] After the above named entity recognition, you can remove words that are not related to classification, such as season , Shopping guide, style, etc., leaving the category, material, weaving method and other words related to the classification, to facilitate subsequent feature processing. Because this left word can better reflect the characteristics of specific logistics objects when HScode is classified, it can be called a characteristic word. Step 3: Summarize and de-duplicate the feature words obtained from each training sample to obtain a set of feature words, and assign corresponding serial numbers to each feature word; after obtaining the feature words, you can analyze the features in each training sample Words are summarized and deduplicated to obtain a set of feature words, and in order to facilitate the expression of the text description information in each training sample in a vector way, subsequent probability calculation can be performed by vector calculation, and each feature can also be separately The word is assigned a corresponding serial number. For example, assuming that the feature vocabulary in each training sample is set together, there are 10,000 feature words in total, then these feature words can be numbered from 1 to 10000 respectively. In this way, for each training sample, it is only necessary to generate a corresponding feature word vector according to the inclusion of the feature words on each serial number. Step 4: According to the inclusion of the feature words on each serial number in each training sample, generate the feature word vector corresponding to each training sample; as described in step 3, after the feature word set is generated and the respective serial numbers are obtained When expressing the text description information in each training sample, the feature word vector corresponding to each training sample can be generated according to the inclusion of the feature words on each serial number of each training sample. That is to say, assuming there are 10,000 feature words in total, each training sample can correspond to a 10,000-dimensional feature word vector. Since the above feature words are obtained by segmenting and filtering according to each training sample, the feature words contained in each training sample will definitely exist in the feature word set. That is, the feature words included in each training sample are a subset of the above feature word set. Among them, for a training sample, the value of each element in the feature word vector can be determined according to whether there is a feature word on the corresponding serial number. For example, the feature words included in a training sample are No. 1, 12, No. 23, No. 25, No. 68, No. 1279, etc., then the feature word vector of the training sample can be corresponding to the above sequence number The element value is 1, and the element values on other serial numbers are 0, to express which feature words are included in the training sample. Or, in another implementation, you can also assign initial weights to the element values on each sequence number based on the information such as the attributes of specific feature words. If there are feature words on a sequence number in the training sample, you can correspond to the sequence number The element value of is set as the initial weight of the characteristic word, which represents the importance of the classification of the characteristic word corresponding to the HScode product category. For example, the generated feature word vectors can be {1: 0.2, 4: 0.5, 12: 0.6, 1009: 0.3, 3801: 0.2...}, that is, the training sample contains feature words No. 1 and feature words No. 4 , The 12th feature word, the 1009 feature word, the 3801 feature word, etc., and their corresponding initial weights are 0.2, 0.5, 0.6, 0.3, 0.2, etc., respectively. It should be noted that, in the above example, since there is no feature of other serial numbers (for example, 2, 3, 5, 6, ...) in the training sample, the corresponding element value is 0, which is not shown in the above vector. In the specific implementation, in order to facilitate multiplication between vectors, the element value at 0 is also present in the specific vector. Of course, in the specific implementation, each training sample corresponds to a vector of 10,000 or more dimensions. When calculating, there may be a case of occupying relatively large computing resources, and due to the characteristics contained in each training sample The number of words is usually very small relative to the total number of dimensions of the vector. Therefore, the value of most elements in the vector is 0. Therefore, it may cause a waste of computing resources. For this reason, in alternative embodiments, HScodes can also be grouped in advance. For example, certain commodity categories corresponding to HScodes have relatively strong similarities, so they can be divided into a group to form a large category, and so on. Among them, the grouping basis for grouping HScodes can also be information such as the category system defined in the online sales system, so that the category system defined in the online sales system can be associated with this customs HScode, It is also convenient for more efficient classification and prediction when making subsequent predictions. For example, the category system defined in the online sales system includes first-level categories such as clothing, daily necessities, home appliances, and computer consumables. Each first-level category also includes multiple second-level categories, and the second-level category can also include third-level categories. Head, and so on, and finally to the leaf category. When grouping HScode, you can group according to a certain category in the specific category system. According to different category levels, the number of HScode groups divided will be different, and the number of HScode contained in each group is also Will be different. It can be selected according to actual needs. After grouping in the above manner, the classification model can be trained in each group as a unit, so that the number of training samples within each group will be reduced, so the total number of corresponding feature words will also be reduced. Finally, the dimension of the feature word vector corresponding to each training sample will also be reduced, thereby reducing the amount of calculation and improving the training efficiency. Step 5: Input feature word vectors corresponding to multiple training samples associated with the same HScode respectively into the preset machine learning model for training, to obtain classification models corresponding to each HScode. After obtaining the feature word vectors of each training sample, you can input the feature word vectors corresponding to multiple training samples associated with the same HScode into the preset machine learning model for training, that is, assume that the training sample corresponds to a certain There are a total of 1000 training samples for one HScode, then the feature vectors corresponding to the 1000 training samples can be input into the machine learning model for training. Among them, there may be multiple specific machine learning models, for example, it may include but not limited to classification models such as SVM, LR, naive Bayes, maximum entropy, and deep learning methods such as lstm+softmax, etc. After multiple rounds of repeated operations until the algorithm converges, the classification model corresponding to the HScode can be obtained. The classification model can also be represented by a vector, for example, {f1: w1, f2: w2, f3: w3, f4: w4, f5: w5, f6: w6...}, where fn represents the serial number of specific feature words, wn represents the corresponding weight. That is, for a certain HScode, the training result is used to express, and the importance of the feature word corresponding to each serial number for the HScode. In short, after machine learning training, each HScode can correspond to a feature word weight vector. In each feature word weight vector, the weights corresponding to the feature words on the same serial number may be different. The trained classification model can be persistently stored in storage media such as magnetic disks, or as described above, an interface-based classification tool can be generated based on the model and provided to various users for use. Of course, in addition to the above logistic regression model, other types of classification models such as decision tree model and neural network model can also be used. For example, for the decision tree model, the decision process can be based on word features in multiple tree models, and based on the threshold value of the split stored in each tree and the characteristics of the feature word vector, determine which logistics object belongs to each tree. A leaf node, which determines the probability that the logistics object is classified into the corresponding category of each potential HScode. For the neural network model, the specific coding classification model can have multiple layers of non-linear change units. The non-linear change units of each layer are connected in series with the non-linear change units of the next layer. The feature weight of the feature vector derived from the feature word vector, and the probability that the logistics object is classified into the corresponding category of each code through the interaction of multiple layers of nonlinear change units. More specific details will not be detailed here. In addition, for other codes than HScode, a code classification model can also be obtained in a similar manner. The above process of establishing a coding classification model may be completed in advance, and after completion, a specific model may be used to classify the target logistics objects to be classified. Specifically, referring to FIG. 3, the following steps may be included: S301: Determine the text description information of the target logistics object to be classified and process the text description information to determine the target feature words included; from this step, mainly The process of predicting the coding of specific target logistics items using the above coding classification model. Specifically, the text description information of the target logistics object to be classified can be determined first, wherein the specific text description information can be obtained from the information such as the title of the logistics object. In the specific implementation, if it is an interface-based tool, as shown in FIG. 4, an interface for entering text description information of the target logistics object may also be provided in the interface, for example, it may be an input box. Alternatively, you can also provide an entry for batch importing the text description information of multiple target logistics objects, so that users can organize the text description information of logistics objects that need to be classified in advance through Excel tables and other methods. Name the specified fields, etc. to name the data rows in the table. After that, you can import the text description information of each logistics object recorded in the table into the tool through the above batch operation entrance, and so on. Among them, whether it is a single entry of the text description information of the target logistics object or batch import, the specific target logistics object can be a logistics object waiting for customs declaration, for example, it can be a target logistics object extracted from a specific cross-border order Information such as the title of the title, etc. S302: Generate a feature word vector corresponding to the target logistics object according to the inclusion of each target feature word in the text description information; after obtaining the text description information of the target data object, the text description information can be processed, specifically The processing method can be the same as the processing method of the text description information in the training sample. For example, it is also possible to perform word segmentation processing, filter out invalid words therein, and determine the remaining valid words as target feature words. Then, the feature word vector corresponding to the target logistics object can also be generated according to the content of the text description information for each target feature word. Specifically, the feature word vector corresponding to the target logistics object may be generated according to the inclusion of the feature words on each serial number in the text description information of the target logistics object. For example, if the text description information of the target logistics item includes feature number 1, feature number 5, feature number 8, feature number 27, etc., then the element value corresponding to each of the above serial numbers may be 1, or a preset The initial weight value, and the element value corresponding to other serial numbers are 0. Of course, in practical applications, the text description information of the target logistics object to be predicted may include vocabulary that has not been included in the training process. For this vocabulary, it can be filtered out without entering into a specific classification Model. However, after completing this prediction, you can also determine whether it is related to the HScode classification based on the named entity information of the vocabulary. If it is related, you can also add it as a feature word to the corresponding feature word set, and you can re-check the model Conduct training, etc. It should be noted here that the dimension of the feature word vector generated for the target logistics object is consistent with the number of feature words in the feature word set during training. For example, if during training, the feature vocabularies in all training samples are collected together to form a feature word set, and the number of feature words included is N, then the feature word vector corresponding to the target logistics object to be predicted can also be N-dimensional vector. In addition, if HScode is grouped during training and the feature words in the training samples corresponding to the HScode in each group are aggregated, the number of feature words in each group will also be reduced. In this case, before generating the feature word vector for the target logistics object, you can also first determine the group to which the target logistics object belongs. For example, if HSCode is performed according to the category system in an online sales system Grouping, you can determine the corresponding HScode group according to the category of the target logistics item under the category system in the online sales system. Furthermore, the feature word set included in the HScode group can be used to determine the feature word vector of the current target logistics object. S303: Input the feature word vector into the coding classification model to obtain corresponding classification feature information. After determining the feature word vector corresponding to the target logistics object, it can be input into the coding classification model to obtain specific classification feature information. For example, in a specific implementation, for the case of using a logistic regression model for HScode classification, the feature word vector may be input into the coding classification model to determine the probability that the target logistics object belongs to the corresponding category of each HScode. In addition, It is also possible to provide classification suggestion information according to the probability. Specifically, the feature word vector of the target logistics object can be multiplied with the feature word weight vector corresponding to each HScode (it may also be adjusted by an offset value, etc.) to obtain that the target logistics object belongs to the corresponding category of each HScode Probability value. Among them, if group training is performed during the training, while the feature word vector is input into the classification model, the group information corresponding to the target logistics object may also be input into the classification model. In this way, only the feature word vector of the target logistics object needs to be calculated with the feature word weight vector corresponding to each HScode in the group, and there is no need to separately calculate the probability of all HScodes, which can save calculation the amount. After calculating the probability that the target logistics item belongs to the corresponding category of each HScode, the corresponding classification suggestion information can also be returned. For example, one or more HScodes with a probability higher than the preset threshold can be returned, so that the user can determine the specific HScode for the target logistics item based on this recommended result. In short, through the embodiment of the present application, the coding classification model can be determined in advance, so that for the target logistics objects to be classified, the text description information can be obtained and processed, and the included target feature words are determined, and then , According to the content of the text description information for each target feature word, generate the feature word vector corresponding to the target logistics object, and then, the feature word vector can be input into the coding classification model to obtain the corresponding classification feature information . In this way, automatic classification of logistics items can be achieved without relying on manual classification, therefore, efficiency and accuracy can be improved. In an alternative embodiment, through the collection and processing of training samples and machine learning training, a classification model for each HScode can be obtained, which can be specifically represented by a feature word weight vector, in which each feature word pair is associated with the HScode Determine the weight value. In this way, specifically when predicting a certain target data object, the text description information of the target data object can be subjected to word segmentation and other processing to determine the feature words contained therein and generate feature word vectors. In this way, this feature word vector can be input into the classification model of the previous training number, so that the probability that the target logistics object is classified into each HScode can be calculated, and advice information can be given accordingly, for example, it can be given One or several suggested HScodes, etc. In this way, the process of classifying the target data objects no longer completely depends on experts, which can reduce labor costs, and the efficiency of classification is also improved, not limited by the experience and personal capabilities of experts. Embodiment 2 This embodiment 2 provides a method for generating a coding classification model. Referring to FIG. 5, the method may specifically include: S501: Collect training samples, where each training sample includes known text description information of logistics objects and Correspondence between codes; where the code may specifically refer to the customs code HScode mentioned above. S502: Perform word segmentation processing on the text description information in the training sample, and filter out invalid words to obtain feature words; S503: Aggregate and deduplicate the feature words obtained from each training sample to obtain a set of feature words, And assign corresponding serial numbers to each feature word separately; S504: generate feature word vectors corresponding to each training sample according to the inclusion of feature words on each serial number in each training sample; S505: associate multiple The feature word vectors corresponding to the training samples are input into the preset machine learning model for training, and the classification models corresponding to the respective codes are obtained. For the undetailed parts in this second embodiment, please refer to the records in the foregoing first embodiment, which will not be repeated here. Corresponding to the first embodiment, an embodiment of the present application also provides a logistics object information processing device. Referring to FIG. 6, the device may specifically include: a target logistics object information determination unit 601, configured to determine the target logistics object to be classified Text description information and process the text description information to determine the included target feature words; feature vector generation unit 602 is used to determine the text description information of the target logistics object to be classified and process the text description information, Determine the included target feature words; the classification feature information acquisition unit 603 is used to input the feature word vector into the encoding classification model to acquire corresponding classification feature information. Wherein, the coding classification model includes a logistic regression model, a decision tree model, and a neural network model. If the coding classification model is a logistic regression model, the coding classification model stores the feature word weight vector corresponding to each coding. Specifically, the encoding package customs code HScode, the encoding classification model stores the feature word weight vector corresponding to each customs code HScode; the feature word weight vector records the distinguishing weight value of each feature word for the associated HScode . If the coding classification model is a decision tree model, the coding classification model saves multiple tree models, and based on each tree saves the split threshold and the characteristics of feature word vectors to determine the target logistics object The probability of being classified into the corresponding category of each potential code. If the coding classification model is a neural network model, the coding classification model has multiple layers of non-linear change units, each layer of the non-linear change unit is connected in series with the next layer of non-linear change units, and each layer of non-linear change units is stored There are feature weights based on feature word vectors or feature vectors derived from feature word vectors, in order to obtain the probability that logistics objects are classified into the corresponding category of each potential code through the interaction of multiple layers of nonlinear change units. In specific implementation, the classification feature information acquiring unit may be specifically used to input the feature word vector into the encoding classification model, and determine the probability that the target logistics object is classified into a category corresponding to each potential encoding. It can also be used to provide classified suggestion information based on the probability. The coding classification model is established in the following ways: a sample collection unit, used to collect training samples, where each training sample includes a known correspondence relationship between the textual description information of logistics objects and HScode; a feature word determination unit, used To perform word segmentation on the text description information in the training samples, and filter out invalid words to obtain feature words; a feature vocabulary general unit is used to aggregate and de-duplicate the feature words obtained from each training sample to obtain Feature word set, and assign corresponding serial numbers to each feature word; Feature word vector generation unit, used to generate feature word vectors corresponding to each training sample according to the inclusion of feature words on each serial number in each training sample The training unit is used to input feature word vectors corresponding to multiple training samples associated with the same HScode into a preset machine learning model for training to obtain classification models corresponding to each HScode. In specific implementation, the feature vector generating unit may be specifically used to generate a feature word vector corresponding to the target logistics object according to the inclusion of the feature words on each serial number in the text description information of the target logistics object. Wherein, during model training, the device may further include: a data cleaning unit, used for cleaning the training samples after collecting the training samples, so as to use the remaining valid training samples to train the classification model. Specifically, the data cleaning unit may be specifically used to: pre-save the information about the mapping relationship between the new and old HScodes; for the training samples where old HScodes appear, replace them with new HScodes according to the mapping relationship, and then add them to the training samples as valid training samples In the collection. Alternatively, the data cleaning unit can also be used to: save the list of disabled HScodes in advance; delete the training samples where the disabled HScodes appear. Alternatively, the data cleaning unit can also be used to: pre-save the list of split HScode information, where each split HScode information includes the HScode before split and the corresponding multiple HScodes after split; The training samples of the HScode before splitting are extracted to re-determine the HScode after the splitting, and then added to the training sample set as valid training samples. In specific implementation, the device may further include: a vocabulary filtering unit, configured to perform named entity recognition on the vocabulary obtained by the word segmentation result of the text description information, and filter the vocabulary irrelevant to the classification of logistics objects according to the named entity recognition result. In addition, specifically during model training, the device may further include: a grouping unit, which is used for summarizing and deduplicating the feature words obtained from each training sample, according to the categories in the relevant online sales system One level of category information under the system, the HScode is grouped to obtain multiple groups, each group includes multiple HScodes, in order to summarize the feature words in each HScode group unit And generate feature vectors and model training. Specifically, the classification model may also store the correspondence between each group and HScode; during the prediction, the device may further include: a group determination unit, which is used to determine The category belonging to the category system of the online sales system determines the corresponding HScode group for it; the prediction unit may be specifically used for: inputting the HScode group corresponding to the target logistics object and the feature word vector Into the classification model, in order to determine the probability that the target logistics object belongs to the corresponding category of each HScode under the group. Corresponding to the second embodiment, an embodiment of the present application also provides an apparatus for generating a customs code classification model. Referring to FIG. 7, the apparatus may specifically include: a sample collection unit 701 for collecting training samples, wherein each training sample It includes the known correspondence between the text description information of the logistics objects and the encoding; the feature word determination unit 702 is used to perform word segmentation processing on the text description information in the training sample, and filter out invalid words to obtain feature words; The feature vocabulary general unit 703 is used to summarize and de-duplicate the feature words obtained from each training sample to obtain feature word sets, and assign corresponding serial numbers to each feature word; feature word vector generating unit 704 is used to According to the inclusion of the feature words on each serial number in each training sample, generate the feature word vector corresponding to each training sample; the training unit 705 is used to input the feature word vector corresponding to multiple training samples associated with the same encoding, respectively Train to a preset machine learning model to obtain the coding classification model corresponding to each code; the coding classification model stores the feature word weight vector corresponding to each code; each feature is recorded in the feature word weight vector The discriminant weight value of the word pair association coding. In addition, corresponding to the first embodiment of the present application, the embodiment of the present application further provides a computer system, including: one or more processors; and a memory associated with the one or more processors, the memory used To store program instructions, when the program instructions are read and executed by the one or more processors, the following operations are performed: determining the text description information of the target logistics object to be classified and processing the text description information, Determine the included target feature words; generate the feature word vector corresponding to the target logistics object according to the content of the text description information for each target feature word; input the feature word vector into the coding classification model to obtain the corresponding classification Feature information. Among them, FIG. 8 exemplarily shows the architecture of a computer system, which may specifically include a processor 810, a video display card 811, a disk drive 812, an input/output interface 813, a network interface 814, and a memory 820. The processor 810, the video display card 811, the disk drive 812, the input/output interface 813, and the network interface 814 can be connected to the memory 820 through a communication bus 830. The processor 810 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits, etc., It is used to execute relevant programs to realize the technical solution provided by this application. The memory 820 may be implemented in the form of ROM (Read Only Memory, read only memory), RAM (Random Access Memory), static storage device, dynamic storage device, and the like. The memory 820 may store an operating system 821 for controlling the operation of the computer system 800, and a basic input output system (BIOS) 822 for controlling the low-level operation of the computer system 800. In addition, a web browser 823, a data storage management system 824, a classification processing system 825, etc. can also be stored. The foregoing classification processing system 825 may be an application program that specifically implements the foregoing steps in the embodiments of the present application. In short, when the technical solution provided by the present application is implemented by software or firmware, the relevant program code is stored in the memory 820 and is called and executed by the processor 810. The input/output interface 813 is used to connect input/output modules to realize information input and output. The input/output/module can be configured as a component in the device (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, and an indicator light. The network interface 814 is used to connect a communication module (not shown in the figure) to realize communication interaction between the device and other devices. Among them, the communication module can realize communication through wired methods (such as USB, network cable, etc.), and can also realize communication through wireless methods (such as mobile network, WIFI, Bluetooth, etc.). The bus 830 includes a path to transmit information between various components of the device (such as the processor 810, video display card 811, disk drive 812, input/output interface 813, network interface 814, and memory 820). In addition, the computer system 800 can also obtain information on specific collection conditions from the virtual resource object collection condition information database 841 for use in condition judgment, and so on. It should be noted that although the above device only shows the processor 810, the video display card 811, the disk drive 812, the input/output interface 813, the network interface 814, the memory 820, the bus 830, etc., but in the specific implementation In the process, the device may also include other elements necessary for normal operation. In addition, those skilled in the art may understand that the above-mentioned device may also include only the elements necessary to implement the solution of the present application, and does not necessarily include all the elements shown in the figures. It can be known from the description of the above embodiments that those skilled in the art can clearly understand that this application can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the technical solution of the present application can be embodied in the form of software products in essence or part that contributes to the existing technology, and the computer software products can be stored in storage media, such as ROM/RAM, disk, An optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present application. The embodiments in this specification are described in a progressive manner. The same or similar parts between the embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments. In particular, for the system or the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method embodiment. The system and system embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is It can be located in one place, or it can be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art can understand and implement without paying creative labor. The information processing method, device and computer system for logistics objects provided by this application are described in detail above. Specific examples are used in this article to explain the principle and implementation of this application. The descriptions of the above embodiments are only for understanding The method of the present application and its core idea; meanwhile, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in the specific implementation manner and application scope. In summary, the content of this specification should not be understood as a limitation to this application.