[go: up one dir, main page]

CN111611211A - File import and archive method, electronic device and storage medium - Google Patents

File import and archive method, electronic device and storage medium Download PDF

Info

Publication number
CN111611211A
CN111611211A CN202010346888.XA CN202010346888A CN111611211A CN 111611211 A CN111611211 A CN 111611211A CN 202010346888 A CN202010346888 A CN 202010346888A CN 111611211 A CN111611211 A CN 111611211A
Authority
CN
China
Prior art keywords
file
files
compressed package
archived
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010346888.XA
Other languages
Chinese (zh)
Inventor
陈璐琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010346888.XA priority Critical patent/CN111611211A/en
Publication of CN111611211A publication Critical patent/CN111611211A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及人工智能的智能决策,提供一种文件导入归档方法,包括:以压缩包的形式传输文件,所述压缩包包括一个或多个文件;读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;根据识别出的有效参数判断文件命名是否符合命名规则;将不符合命名规则的文件进行导入失败及原因的显示;将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。本发明还提供一种电子装置及存储介质。本发明能够对文件进行自动归档。本发明还涉及区块链技术,所述压缩包存储于区块链中。

Figure 202010346888

The invention relates to intelligent decision-making of artificial intelligence, and provides a file import and archive method, which includes: transmitting files in the form of compressed packages, the compressed packages including one or more files; reading all content items of the transmitted compressed packages, identifying Valid parameters are obtained, the content items include the name of the compressed package, the format of the compressed package, the naming and type of the file in the compressed package, and the source identification of the compressed package, and the valid parameters include file naming or/and file type; The effective parameters of the file name are used to judge whether the file naming conforms to the naming rules; the files that do not conform to the naming rules are imported and the failures and reasons are displayed; the files conforming to the naming rules are filed to the corresponding classification items, and the classification items are classifications according to the file names. Label. The present invention also provides an electronic device and a storage medium. The present invention can automatically archive files. The present invention also relates to blockchain technology, and the compressed package is stored in the blockchain.

Figure 202010346888

Description

文件导入归档方法、电子设备及存储介质File import and archive method, electronic device and storage medium

技术领域technical field

本发明涉及人工智能的智能决策技术领域,更为具体地,涉及一种文件导入归档方法、电子设备及存储介质。The invention relates to the technical field of intelligent decision-making of artificial intelligence, and more particularly, to a file importing and filing method, an electronic device and a storage medium.

背景技术Background technique

现有技术中,文件的归档上传界面通常是让用户逐一多次地去向资产证券化系统(Asset-backedSecurities,ABS)传输所需文件,这种交互行为不仅大大增加了用户的使用费力度,延长了不必要的操作时长。对系统的读取来说,也同样不能做到去统一完整地分析归档数据,可能会存在对多次传输数据的一个遗漏或者读取不全面等弊端。而且,这样会使用户因无法“有效地一次性传输”而困惑例如,资产证券化系统(Asset-backedSecurities,ABS)需要将用户上传的资料归档到不同的目录下,一次只能将一个文件归档到对应目录,多个文件归档时需要重复读取文件,重复归档,使用过程中重复、机械的行为场景出现较多,另外,用户无法得知已经上传了多少文件也无法获知是否上传成功,也就是说用户还无法快速地知晓目前的行为进度和系统反馈。因此,现有技术的文件的归档不能实现多个文件的一次性统一归档,无法显示文件不能归档的原因,需要借助人工客服,用户体验舒适度差,尤其是在复杂产品的研发文件的归档,容易造成缺失功能性迭代,增加产品开发的成本,减少收益。In the prior art, the file uploading interface usually allows users to transfer the required files to the asset securitization system (Asset-backed Securities, ABS) multiple times one by one. Unnecessary operation time is extended. For the reading of the system, it is also impossible to analyze the archived data in a unified and complete manner, and there may be an omission or incomplete reading of the data transmitted multiple times. Moreover, this will confuse users because they cannot "effectively transfer all at once". For example, Asset-backed Securities (ABS) need to archive user-uploaded materials in different directories, and only one file can be archived at a time To the corresponding directory, when multiple files are archived, the files need to be read and archived repeatedly. There are many repeated and mechanical behavior scenarios during use. In addition, the user cannot know how many files have been uploaded or whether the upload is successful. That is to say, users cannot quickly know the current behavior progress and system feedback. Therefore, the filing of files in the prior art cannot realize the one-time unified filing of multiple files, and cannot show the reason why the files cannot be filed. It requires the help of manual customer service, and the user experience is poor, especially in the filing of R&D documents of complex products. It is easy to cause missing functional iterations, increase the cost of product development, and reduce revenue.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题,本发明的目的是提供一种对文件能够自动归档的文件导入归档方法、电子设备及存储介质。In view of the above problems, the purpose of the present invention is to provide a file import and filing method, an electronic device and a storage medium capable of automatically filing files.

为了实现上述目的,本发明提供一种电子设备,所述电子设备包括存储器和处理器,所述存储器中存储有文件导入归档程序,所述文件导入归档程序被所述处理器执行时实现如下步骤:In order to achieve the above object, the present invention provides an electronic device, the electronic device includes a memory and a processor, the memory stores a file import and archive program, and the file import and archive program is executed by the processor to achieve the following steps :

以压缩包的形式传输文件,所述压缩包包括一个或多个文件;transmitting the file in the form of a compressed package, the compressed package including one or more files;

读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;Read all the content items of the transmitted compressed package, and identify the valid parameters, the content items include the name of the compressed package, the compressed package format, the naming and type of the files in the compressed package, and the source identifier of the compressed package, and the valid parameters including file naming or/and file type;

根据识别出的有效参数判断文件命名是否符合命名规则;Determine whether the file naming conforms to the naming rules according to the identified valid parameters;

将不符合命名规则的文件进行导入失败及原因的显示;Display the failure of importing files that do not meet the naming rules and the reasons;

将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。Files that conform to the naming rules are archived into corresponding classification items, where the classification items are classification labels classified according to file names.

此外,为了实现上述目的,本发明还提供一种文件导入归档方法,包括:In addition, in order to achieve the above purpose, the present invention also provides a method for importing and filing files, comprising:

以压缩包的形式传输文件,所述压缩包包括一个或多个文件;transmitting the file in the form of a compressed package, the compressed package including one or more files;

读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;Read all the content items of the transmitted compressed package, and identify the valid parameters, the content items include the name of the compressed package, the compressed package format, the naming and type of the files in the compressed package, and the source identifier of the compressed package, and the valid parameters including file naming or/and file type;

根据识别出的有效参数判断文件命名是否符合命名规则;Determine whether the file naming conforms to the naming rules according to the identified valid parameters;

将不符合命名规则的文件进行导入失败及原因的显示;Display the failure of importing files that do not meet the naming rules and the reasons;

将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。Files that conform to the naming rules are archived into corresponding classification items, where the classification items are classification labels classified according to file names.

在一个实施例中,所述将不符合命名规则的文件进行导入失败及原因的显示的步骤包括:以弹窗的形式,展示出导入失败的全部文件名称及原因。In one embodiment, the step of displaying import failures and reasons for files that do not meet the naming rules includes: displaying all file names and reasons for import failures in the form of a pop-up window.

优选地,所述弹窗包括第一弹窗、第二弹窗和第三弹窗,第一弹窗用于在上传文件的客户端显示导入失败及原因,第二弹窗用于向在上传文件的客户端提供忽略错误并导入文件的选项,第三弹窗用于向客户端提供重新上传的选择。Preferably, the pop-up window includes a first pop-up window, a second pop-up window and a third pop-up window. The first pop-up window is used to display the import failure and the reason on the client side that uploads the file, and the second pop-up window is used to display the import failure and the reason to the uploading client. The client of the file provides the option to ignore the error and import the file, and a third popup is used to provide the client with the option of re-uploading.

更进一步,优选地,所述将不符合命名规则的文件进行导入失败及原因的显示的步骤还包括:当导入失败的文件少于设定个数时,弹出第二弹窗;当导入失败的文件不少于设定个数时,弹出第三弹窗。Further, preferably, the step of displaying the failure of importing and the reasons for the files that do not meet the naming rules also includes: when the number of files that fail to import is less than the set number, a second pop-up window pops up; When the number of files is not less than the set number, a third pop-up window will pop up.

在一个实施例中,所述压缩包存储于区块链中,所述将符合命名规则的文件归档到对应的分类项的步骤包括:In one embodiment, the compressed package is stored in the blockchain, and the step of filing the files conforming to the naming rules to the corresponding classification items includes:

按照命名规则提取压缩包中文件的关键词,通过向量空间模型获得压缩包中文件的关键词的特征值,每一个文件的所有关键词的特征值构成所述文件特征向量;Extract the keywords of the files in the compressed package according to the naming rules, obtain the eigenvalues of the keywords of the files in the compressed package through the vector space model, and the eigenvalues of all the keywords of each file constitute the file feature vector;

构建基于特征向量的分类模型;Build a feature vector-based classification model;

将压缩包中的文件的特征向量输入分类模型对压缩包中的文件进行预判,并将压缩包中的文件归档到对应的分类项。The feature vector of the files in the compressed package is input into the classification model to predict the files in the compressed package, and the files in the compressed package are archived to the corresponding classification items.

优选地,所述构建基于特征向量的分类模型构建步骤包括:Preferably, the step of constructing a feature vector-based classification model includes:

根据共同关键词的个数筛选出与每个文件相匹配的设定个数的已归档文件,作为每个文件的训练集;According to the number of common keywords, a set number of archived files matching each file are filtered out as the training set of each file;

每个文件的训练集的已归档文件的特征向量构成训练集的特征矩阵,已归档文件的分类项构成训练集的分类项矩阵;The feature vector of the archived files of the training set of each file constitutes the feature matrix of the training set, and the classification items of the archived files constitute the classification item matrix of the training set;

通过所述文件的训练集的特征矩阵和分类项矩阵,获得所述文件的每个关键词归档于每个分类项的关键词归档概率;Obtain the keyword filing probability that each keyword of the file is filed in each category item through the feature matrix and the classification item matrix of the training set of the file;

筛选出关键词归档概率最高的分类项,作为关键词的最佳分类项;Filter out the classification item with the highest keyword archiving probability as the best classification item for the keyword;

根据文件的每个关键词的最佳分类项及其对应的的关键词归档概率获得文件归属于各分类项的文件归档概率,构建了分类模型,According to the best classification item of each keyword of the file and its corresponding keyword archiving probability, the file archiving probability of the file belonging to each classification item is obtained, and a classification model is constructed.

其中,所述将压缩包中的文件归档到对应的分类项的步骤包括:Wherein, the step of filing the files in the compressed package to the corresponding classification item includes:

按照文件归档概率由高到低的顺序对各分类项进行显示,供客户端选择,或者是直接将文件归档到文件归档概率最高的分类项。The classification items are displayed in descending order of the file archiving probability for the client to choose, or the file can be directly archived to the classification item with the highest file archiving probability.

在一个实施例中,所述内容项还包括压缩包的授权标识或压缩包中每个文件的授权标识,所述授权标识为客户端的唯一标识。In one embodiment, the content item further includes an authorization identifier of the compressed package or an authorization identifier of each file in the compressed package, where the authorization identifier is a unique identifier of the client.

进一步,优选地,所述每个文件的训练集的已归档文件的特征向量的构建步骤包括:Further, preferably, the step of constructing the feature vector of the archived files of the training set of each file includes:

通过下式获得每个文件的训练集的已归档文件的每个关键词的词频The word frequency of each keyword in the archived files of the training set of each file is obtained by the following formula

Figure BDA0002469602220000031
Figure BDA0002469602220000031

其中,TF(W'm,d'n)为第m个关键词W'm相对于已归档文件d'n的词频,count(W'm,d'n)为关键词W'm在已归档文件d'n中出现的次数,count(d'n)为已归档文件d'n中所有关键词出现的次数之和;Among them, TF( W'm ,d' n ) is the word frequency of the mth keyword W'm relative to the archived file d' n , count( W'm ,d' n ) is the keyword W'm in the archived file d'n. The number of occurrences in the archived file d' n , count(d' n ) is the sum of the occurrences of all keywords in the archived file d'n;

通过下式获得每个文件的训练集的已归档文件的每个关键词的逆词频Obtain the inverse word frequency of each keyword of the archived files of the training set of each file by the following formula

Figure BDA0002469602220000032
Figure BDA0002469602220000032

其中,IDF(W'm)为关键词W'm的逆词频,dm为出现关键词W'm的已归档文件,count(dm)为出现出现关键词W'm的已归档文件的数量,n为筛选出的训练集的已归档文件的个数;Among them, IDF(W' m ) is the inverse word frequency of the keyword W' m , d m is the archived file where the keyword W' m appears, and count(d m ) is the number of archived files where the keyword W' m appears. Number, n is the number of archived files in the filtered training set;

通过下式获得每个文件的训练集的已归档文件的每个关键词的词频-逆词频,作为所述关键词的特征值,每一个已归档文件的各个关键词的特征值构成特征向量The word frequency-inverse word frequency of each keyword of the archived files of the training set of each file is obtained by the following formula, as the feature value of the keyword, and the feature value of each keyword of each archived file constitutes a feature vector

TFIDF(W'm,d'n)=TF(W'm,d'n)*IDF(W'm)TFIDF( W'm ,d' n )=TF( W'm ,d' n )*IDF( W'm )

其中,TFIDF(W'm,d'n)为关键词W'm在已归档文件d'n的词频-逆词频。Among them, TFIDF( W'm ,d' n ) is the word frequency-inverse word frequency of the keyword W'm in the archived file d' n .

此外,为了实现上述目的,本发明还提供一种计算机可读存储介质,所述计算机可读存储介质中包括有文件导入归档程序,所述文件导入归档程序被处理器执行时,实现上述文件导入归档方法的步骤。In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium, the computer-readable storage medium includes a file importing and archiving program, and when the file importing and archiving program is executed by a processor, the above-mentioned file importing is realized. The steps of the filing method.

本发明所述文件导入归档方法、电子设备及存储介质以压缩包的形式,传输单个压缩包文件,读取出所传压缩包中的全部内容项,并识别出有效参数,分门别类地将不同类别的文件一次性地归置到对应的分类项中,实现压缩包中文件的自动匹配和动态分配,将不符合命名规则的文件进行导入失败及原因的显示,及时将未读取成功的文件反馈给客户端,将“人工智能匹配”等技术运用到每一个业务场景中,减少了重复、单一、多次的使用场景。The file importing and filing method, electronic device and storage medium of the present invention transmits a single compressed package file in the form of a compressed package, reads out all content items in the transmitted compressed package, identifies effective parameters, and categorizes different types of files. The files are placed into the corresponding classification items at one time, to realize automatic matching and dynamic allocation of files in the compressed package, to display the import failure and the reason of the files that do not meet the naming rules, and to feed back the files that have not been successfully read to the system in a timely manner. On the client side, technologies such as "artificial intelligence matching" are applied to each business scenario, reducing repeated, single and multiple usage scenarios.

附图说明Description of drawings

图1是本发明文件导入归档方法较佳实施例的应用环境示意图;Fig. 1 is the application environment schematic diagram of the preferred embodiment of the file importing and filing method of the present invention;

图2是图1中文件导入归档程序较佳实施例的模块示意图;Fig. 2 is the module schematic diagram of the preferred embodiment of file import filing program among Fig. 1;

图3是本发明文件导入归档方法较佳实施例的流程图。FIG. 3 is a flow chart of a preferred embodiment of the file importing and filing method of the present invention.

具体实施方式Detailed ways

应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

以下将结合附图对本发明的具体实施例进行详细描述。The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

本发明提供一种文件导入归档方法,应用于一种电子设备。参照图1所示,为本发明文件导入归档方法较佳实施例的应用环境示意图。The invention provides a file importing and filing method, which is applied to an electronic device. Referring to FIG. 1 , it is a schematic diagram of an application environment of a preferred embodiment of the file importing and archiving method of the present invention.

在本实施例中,电子设备1可以是服务器、手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端客户端。In this embodiment, the electronic device 1 may be a terminal client with computing functions, such as a server, a mobile phone, a tablet computer, a portable computer, and a desktop computer.

该电子装置1包括存储器11、处理器12、网络接口13及通信总线14。The electronic device 1 includes a memory 11 , a processor 12 , a network interface 13 and a communication bus 14 .

存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子设备1的内部存储单元,例如该电子设备1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子设备1的外部存储器,例如所述电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1 , such as a hard disk of the electronic device 1 . In other embodiments, the readable storage medium may also be an external memory of the electronic device 1, such as a pluggable hard disk, a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1, Secure digital (Secure Digital, SD) card, flash memory card (Flash Card) and so on.

在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子设备1的文件导入归档程序10等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。In this embodiment, the readable storage medium of the memory 11 is generally used to store the file import and archive program 10 installed in the electronic device 1 and the like. The memory 11 can also be used to temporarily store data that has been output or is to be output.

处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行文件导入归档程序10等。The processor 12 may be a central processing unit (Central Processing Unit, CPU), a microprocessor or other data processing chips in some embodiments, and is used to execute program codes or process data stored in the memory 11, such as executing file importing and filing Procedure 10 et al.

网络接口13可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子设备1与其他电子客户端之间建立通信连接。The network interface 13 may optionally include a standard wired interface, a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic clients.

通信总线14用于实现这些组件之间的连接通信。The communication bus 14 is used to realize the connection communication between these components.

图1仅示出了具有组件11-14的电子设备1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。Figure 1 shows only the electronic device 1 with components 11-14, but it should be understood that implementation of all of the components shown is not a requirement and that more or fewer components may be implemented instead.

可选地,该电子设备1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的客户端、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may also include a user interface, and the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone), and a client with a voice recognition function, a voice output device such as a speaker, a headset etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.

可选地,该电子设备1还可以包括显示器,显示器也可以称为显示屏或显示单元。Optionally, the electronic device 1 may further include a display, which may also be referred to as a display screen or a display unit.

在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。In some embodiments, it can be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, and the like. The display is used for displaying information processed in the electronic device 1 and for displaying a visual user interface.

可选地,该电子设备1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。Optionally, the electronic device 1 further includes a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is called a touch area. In addition, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact-type touch sensor, but also a proximity-type touch sensor and the like. In addition, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.

可选地,该电子设备1还可以包括逻辑门电路,传感器、音频电路等等,在此不再赘述。Optionally, the electronic device 1 may further include a logic gate circuit, a sensor, an audio circuit, and the like, which will not be repeated here.

在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统以及文件导入归档程序10;处理器12执行存储器11中存储的文件导入归档程序10时实现如下步骤:In the apparatus embodiment shown in FIG. 1 , the memory 11 as a computer storage medium may include an operating system and a file import and archive program 10; the processor 12 implements the following steps when executing the file import and archive program 10 stored in the memory 11 :

以压缩包的形式传输文件,所述压缩包包括一个或多个文件;transmitting the file in the form of a compressed package, the compressed package including one or more files;

读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;Read all the content items of the transmitted compressed package, and identify the valid parameters, the content items include the name of the compressed package, the compressed package format, the naming and type of the files in the compressed package, and the source identifier of the compressed package, and the valid parameters including file naming or/and file type;

根据识别出的有效参数判断文件命名是否符合命名规则;Determine whether the file naming conforms to the naming rules according to the identified valid parameters;

将不符合命名规则的文件进行导入失败及原因的显示;Display the failure of importing files that do not meet the naming rules and the reasons;

将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。需要强调的是,为进一步保证上述压缩包的私密和安全性,上述压缩包还可以存储于一区块链的节点中。Files that conform to the naming rules are archived into corresponding classification items, where the classification items are classification labels classified according to file names. It should be emphasized that, in order to further ensure the privacy and security of the above compressed package, the above compressed package can also be stored in a node of a blockchain.

在其他实施例中,所述文件导入归档程序10还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由处理器12执行,以完成本发明。本发明所称的模块是指能够完成特定功能的一系列计算机程序指令段。参照图2所示,为图1中文件导入归档程序10较佳实施例的功能模块图。所述文件导入归档程序10可以被分割为传输模块110、参数读取模块120、判断模块130、显示模块140和归档模块150,其中:In other embodiments, the file importing and archiving program 10 can also be divided into one or more modules, and one or more modules are stored in the memory 11 and executed by the processor 12 to complete the present invention. A module referred to in the present invention refers to a series of computer program instruction segments capable of accomplishing specific functions. Referring to FIG. 2 , it is a functional block diagram of a preferred embodiment of the file import and archive program 10 in FIG. 1 . The file import and archive program 10 can be divided into a transmission module 110, a parameter reading module 120, a judgment module 130, a display module 140 and an archive module 150, wherein:

传输模块110,以压缩包的形式传输文件,所述压缩包包括一个或多个文件;The transmission module 110, transmits the file in the form of a compressed package, and the compressed package includes one or more files;

参数读取模块120,读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;;The parameter reading module 120 reads all the content items of the transmitted compressed package, and identifies valid parameters, the content items include the name of the compressed package, the compressed package format, the name and type of the files in the compressed package, and the source of the compressed package Identification, the valid parameters include file naming or/and file type;

判断模块130,根据识别出的有效参数判断文件命名是否符合命名规则,如果不符合命名规则,发送信号给显示模块140,如果符合命名规则,发送信号给归档模块150;The judgment module 130 judges whether the file naming conforms to the naming rules according to the identified valid parameters, if it does not conform to the naming rules, sends a signal to the display module 140, and if it conforms to the naming rules, sends a signal to the filing module 150;

显示模块140,将不符合命名规则的文件进行导入失败及原因的显示;The display module 140 displays the import failure and the reason for the files that do not meet the naming rules;

归档模块150,将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。The archiving module 150 archives the files conforming to the naming rules to corresponding classification items, where the classification items are classification labels classified according to the file names.

在一个可选实施例中,上述显示模块140以弹窗的形式,展示出导入失败的全部文件名称及原因。In an optional embodiment, the above-mentioned display module 140 displays all file names and reasons for import failure in the form of a pop-up window.

优选地,显示模块140包括:Preferably, the display module 140 includes:

第一弹窗设定单元,设定第一弹窗,所述第一弹窗用于在上传文件的客户端显示导入失败及原因;a first pop-up window setting unit, which sets a first pop-up window, and the first pop-up window is used to display the import failure and the reason on the client side that uploads the file;

第二弹窗设定单元,设定第二弹窗,所述第二弹窗用于向在上传文件的客户端提供忽略错误并导入文件的选项;The second pop-up window setting unit is configured to set a second pop-up window, and the second pop-up window is used to provide an option of ignoring errors and importing the file to the client that uploads the file;

第三弹窗设定单元,设定第三弹窗,所述第三弹窗用于向客户端提供重新上传的选择。The third pop-up window setting unit is configured to set a third pop-up window, and the third pop-up window is used to provide the client with the option of re-uploading.

进一步,优选地,所述显示模块140还包括:Further, preferably, the display module 140 further includes:

弹窗选择单元,当导入失败的文件少于设定个数时,发送信号给第二弹窗设定单元,在客户端弹出第二弹窗;当导入失败的文件不少于设定个数时,发送信号给第三弹窗设定单元,在客户端弹出第三弹窗。The pop-up window selection unit, when the number of files that fail to import is less than the set number, sends a signal to the second pop-up window setting unit to pop up the second pop-up window on the client; when the number of files that fail to import is not less than the set number , send a signal to the third pop-up window setting unit, and pop up the third pop-up window on the client side.

在一个可选实施例中,所述参数读取模块120包括:In an optional embodiment, the parameter reading module 120 includes:

读取单元,读取压缩包名称及格式;Read unit, read the name and format of the compressed package;

存储单元,将读取的压缩包以文件的形式存储;The storage unit stores the read compressed package in the form of a file;

格式转换单元,将存储的压缩包转换为设定的压缩格式;The format conversion unit converts the stored compressed package into the set compression format;

循环单元,循环读取转换格式后的压缩包,获取压缩包中的各个文件。The loop unit reads the compressed package in the converted format cyclically, and obtains each file in the compressed package.

在一个实施例中,所述归档模块150包括:In one embodiment, the filing module 150 includes:

特征向量获得单元,按照命名规则提取各文件的关键词,通过向量空间模型构建压缩包中文件的关键词的特征值构成特征向量,每一个文件的所有关键词的特征值构成所述文件的特征向量;The feature vector obtaining unit extracts the keywords of each file according to the naming rules, constructs the feature vectors of the keywords of the files in the compressed package through the vector space model, and the feature values of all the keywords of each file constitute the feature of the file. vector;

模型构建单元,构建基于特征向量的分类模型;A model building unit to build a classification model based on feature vectors;

归档单元,通过分类模型对压缩包中的文件进行预判,并将压缩包中的文件归档到对应的分类项。The filing unit predicts the files in the compressed package through the classification model, and archives the files in the compressed package to the corresponding classification item.

优选地,模型构建单元包括:Preferably, the model building unit includes:

训练集构建子单元,根据共同关键词的个数筛选出与每个文件相匹配的设定个数的已归档文件,作为每个文件的训练集;The training set constructs a subunit, and filters out the set number of archived files that match each file according to the number of common keywords, as the training set of each file;

矩阵构建子单元,每个文件的训练集的已归档文件的特征向量构成训练集的特征矩阵,已归档文件的分类项构成训练集的分类项矩阵;The matrix construction subunit, the feature vector of the archived files of the training set of each file constitutes the feature matrix of the training set, and the classification items of the archived files constitute the classification item matrix of the training set;

关键词归档概率获得子单元,通过所述文件的训练集的特征矩阵和分类项矩阵,获得所述文件的每个关键词归档于每个分类项的关键词归档概率;The keyword filing probability obtaining subunit obtains the keyword filing probability that each keyword of the file is filed in each classification item through the feature matrix and the classification item matrix of the training set of the file;

筛选子单元,筛选出关键词归档概率最高的分类项,作为关键词的最佳分类项;Screening subunits, and screening out the classification item with the highest keyword archiving probability as the best classification item for the keyword;

文件归档概率获得子单元,根据文件的每个关键词的最佳分类项及其对应的的关键词归档概率获得文件归属于各分类项的文件归档概率,构建了分类模型,The file filing probability obtaining subunit obtains the file filing probability that the file belongs to each classification item according to the best classification item of each keyword of the file and its corresponding keyword filing probability, and constructs a classification model,

其中,归档单元,将各分类项按照文件归档概率由高到低的顺序进行显示,供客户端选择将文件归档到显示的那个分类项,或者是直接将文件归档到文件归档概率最高的分类项。Among them, the filing unit displays the classification items in the order of file filing probability from high to low, so that the client can choose to file the file to the displayed classification item, or directly file the file to the classification item with the highest file filing probability. .

上述电子设备可以进行“及时可视化”的交互行为反馈,做到了从技术层到界面层到情感层的层层配合、环环相扣,确保给用户一个全新的全流程界面引导反馈体系,实现技术生态、情感生态的良好闭环体验。The above-mentioned electronic devices can provide "real-time visual" interactive behavior feedback, which achieves layer-by-layer cooperation and interlocking from the technical layer to the interface layer to the emotional layer, ensuring that users are given a brand-new full-process interface guidance feedback system to realize technology A good closed-loop experience of ecology and emotional ecology.

此外,本发明还提供一种文件导入归档方法。参照图3所示,为本发明文件导入归档较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。In addition, the present invention also provides a file import and archive method. Referring to FIG. 3 , it is a flow chart of a preferred embodiment of file importing and filing according to the present invention. The method may be performed by an apparatus, which may be implemented in software and/or hardware.

在本实施例中,文件导入归档方法包括:In this embodiment, the file import and archive method includes:

步骤S1,以压缩包的形式传输文件,所述压缩包包括一个或多个文件;Step S1, transmit the file in the form of a compressed package, and the compressed package includes one or more files;

步骤S2,读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识(例如IP地址等),所述有效参数包括文件命名或/和文件类型;Step S2, read all the content items of the transmitted compressed package, identify valid parameters, and the content items include the name of the compressed package, the compressed package format, the naming and type of the file in the compressed package and the source identifier of the compressed package (such as IP address, etc.), the valid parameters include file name or/and file type;

步骤S3,根据识别出的有效参数判断文件命名是否符合命名规则,例如,压缩包支持格式包括zip和7z,压缩包包括一个及以上按命名规范命名的doc.或docx.文件,命名规则为:产品名称+文件命名+6位年月日,例如“A产品计划说明书20190101”;Step S3, according to the identified valid parameters, determine whether the file naming conforms to the naming rules. For example, the supported formats of the compressed package include zip and 7z, and the compressed package includes one or more doc. or docx. files named according to the naming convention. The naming rules are: Product name + file name + 6-digit year, month and day, such as "A product plan specification 20190101";

步骤S4,将不符合命名规则的文件进行导入失败及原因的显示,优选地,以弹窗的形式,展示出导入失败的全部文件名称及原因,所述原因包括文件命名重复、文件命名不规范、文件格式不支持等;In step S4, the files that do not meet the naming rules are displayed as import failures and reasons. Preferably, in the form of a pop-up window, all file names and reasons for import failures are displayed, and the reasons include repeated file naming and irregular file naming. , the file format is not supported, etc.;

步骤S5,将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签,例如,文件命名包括计划说明书、标准条款、托管协议等,文件命名是按上例命名规则填写,便可读取成功并将文件上传到页面中对应的文件类型目录(分类项)下,又如,资产证券化系统(Asset-backed Securities,ABS)包括立项报告、申报材料、审核材料、登记和挂牌五个分类项,将压缩包中的文件一次归档到所述五个分类项中的对应项。In step S5, the files that meet the naming rules are filed into the corresponding classification items, and the classification items are classification labels classified according to the file names. For example, the file naming includes plan specifications, standard terms, trusteeship agreements, etc. After filling in the naming rules, the file can be read successfully and uploaded to the corresponding file type directory (category item) on the page. For another example, the asset securitization system (Asset-backed Securities, ABS) includes project initiation reports, application materials, auditing There are five categories of materials, registration and listing, and the files in the compressed package are archived to the corresponding items in the five categories at one time.

上述文件导入归档方法在人机交互行为中,从逐一多次到统一单次的改进,不仅仅是对用户使用效率的一个飞跃性优化提高,对整个技术部门的改革创新也是一份不可或缺的有效生产力。The above-mentioned file import and archiving method in the human-computer interaction behavior, from one by one to a unified single improvement, is not only a leap in the optimization of user efficiency, but also an irreversible improvement in the reform and innovation of the entire technical department. lack of effective productivity.

优选地,步骤S4中的弹窗包括第一弹窗、第二弹窗和第三弹窗,第一弹窗在上传文件的客户端显示导入失败及原因,第二弹窗在上传文件的客户端显示“忽略,并提交”,供客户端进行选择,第三弹窗在客户端显示“重新上传”供客户端进行选择,进一步优选地,当导入失败的文件少于设定个数时,弹出第二弹窗,当导入失败的文件不少于设定个数时,弹出第三弹窗。另外,当客户端选择第三弹窗时,放弃接收已读取成功文件的指令,客户端可线下修改完整后再统一做提交处理。Preferably, the pop-up window in step S4 includes a first pop-up window, a second pop-up window and a third pop-up window. The first pop-up window displays the import failure and the reason on the client that uploads the file, and the second pop-up window displays the import failure and the reason on the client that uploads the file. The terminal displays "Ignore and submit" for the client to choose, and the third pop-up window displays "Re-upload" on the client for the client to choose. Further preferably, when the number of files that fail to import is less than the set number, The second pop-up window will pop up, and the third pop-up window will pop up when the number of files that fail to import is not less than the set number. In addition, when the client selects the third pop-up window, it will give up the instruction to receive the successfully read file, and the client can modify it offline and then submit it uniformly.

在一个实施例中,步骤S2包括:In one embodiment, step S2 includes:

读取压缩包名称及格式;Read the compressed package name and format;

将读取的压缩包以文件的形式存储;Store the read compressed package in the form of a file;

将存储的压缩包转换为设定的压缩格式;Convert the stored compressed package to the set compression format;

循环读取转换格式后的压缩包,获取压缩包中的各个文件,爬取各个文件的文件命名或/和文件类型,识别出压缩包的有效参数。Read the compressed package in the converted format in a loop, obtain each file in the compressed package, crawl the file name or/and file type of each file, and identify the effective parameters of the compressed package.

在一个实施例中,步骤S5包括:In one embodiment, step S5 includes:

按照命名规则提取压缩包中文件的关键词,通过向量空间模型获得压缩包中文件的关键词的特征值,每一个文件的所有关键词的特征值构成所述文件的特征向量;Extract the keywords of the files in the compressed package according to the naming rules, obtain the eigenvalues of the keywords of the files in the compressed package through the vector space model, and the eigenvalues of all the keywords of each file constitute the eigenvectors of the file;

构建基于特征向量的分类模型;Build a feature vector-based classification model;

将压缩包中的文件的特征向量输入分类模型对压缩包中的文件进行预判,并将压缩包中的文件归档到对应的分类项。The feature vector of the files in the compressed package is input into the classification model to predict the files in the compressed package, and the files in the compressed package are archived to the corresponding classification items.

优选地,所述分类模型构建步骤包括:Preferably, the classification model building step includes:

根据共同关键词的个数筛选出与每个文件相匹配的设定个数的已归档文件,作为每个文件的训练集;According to the number of common keywords, a set number of archived files matching each file are filtered out as the training set of each file;

每个文件的训练集的已归档文件的特征向量构成训练集的特征矩阵,已归档文件的分类项构成训练集的分类项矩阵The feature vector of the archived files of the training set of each file constitutes the feature matrix of the training set, and the classification items of the archived files constitute the classification term matrix of the training set

Figure BDA0002469602220000081
Figure BDA0002469602220000081

C=[c1,c2…ca]C=[c 1 , c 2 . . . c a ]

其中,F为文件di的训练集的特征矩阵,C为文件di的训练集的分类项矩阵,m为关键词的特征值总数,n为筛选出的已归档文件的设定个数,fnm为已归档文件d'n的关键词W'm的特征值,Ca为第a个分类项;Among them, F is the feature matrix of the training set of file d i , C is the classification item matrix of the training set of file d i , m is the total number of eigenvalues of keywords, n is the set number of filtered archived files, f nm is the eigenvalue of the keyword W' m of the archived file d' n , and C a is the a-th classification item;

通过所述文件的训练集的特征矩阵和分类项矩阵,根据下式获得所述文件的每个关键词归档于每个分类项的关键词归档概率Through the feature matrix and classification item matrix of the training set of the document, the archiving probability of each keyword of the document being archived in each classification item is obtained according to the following formula

Figure BDA0002469602220000082
Figure BDA0002469602220000082

Figure BDA0002469602220000083
Figure BDA0002469602220000083

其中,f'ma为关键词W'm相对于分类项Ca的特征值,P(W'm|Ca)表示关键词W'm分归档于分类项Ca的关键词归档概率,关键词W'm也是文件di的关键词;Among them, f' ma is the eigenvalue of the keyword W' m relative to the classification item C a , P(W' m |C a ) represents the keyword filing probability of the keyword W' m being filed in the classification item C a , and the key The word W'm is also the key word of the document di;

筛选出关键词归档概率最高的分类项,作为关键词的最佳分类项;Filter out the classification item with the highest keyword archiving probability as the best classification item for the keyword;

根据文件的每个关键词的最佳分类项及其对应的的关键词归档概率获得文件归属于各分类项的文件归档概率According to the best classification item of each keyword of the file and its corresponding keyword archiving probability, the file archiving probability of the file belonging to each classification item is obtained

Figure BDA0002469602220000091
Figure BDA0002469602220000091

其中,Ca为将分类项Ca作为最佳分类项的关键词的总数,na为归档到分类项Ca的关键词的总数,M为文件di的关键词的总数,Wj为文件di的第j个关键词PdiCa表示文件di归档到分类项Ca的文件归档概率;Among them, C a is the total number of keywords with the classification item C a as the best classification item, n a is the total number of keywords filed in the classification item C a , M is the total number of keywords in the file d i , and W j is The jth keyword Pd i C a of the file d i represents the file filing probability that the file d i is filed to the classification item C a ;

将文件归档到各分类项按照文件归档概率由高到低的顺序进行显示,供客户端选择,或者是直接将文件归档到文件归档概率最高的分类项。File files to each category item are displayed in descending order of file filing probability for the client to choose, or directly file files to the category item with the highest file filing probability.

进一步,优选地,所述每个文件的训练集的已归档文件的特征向量的构建步骤包括:Further, preferably, the step of constructing the feature vector of the archived files of the training set of each file includes:

通过下式获得每个文件的训练集的已归档文件的每个关键词的词频;The word frequency of each keyword in the archived files of the training set of each file is obtained by the following formula;

Figure BDA0002469602220000092
Figure BDA0002469602220000092

其中,TF(W'm,d'n)为第m个关键词W'm相对于已归档文件d'n的词频,count(W'm,d'n)为关键词W'm在已归档文件d'n中出现的次数,count(d'n)为已归档文件d'n中所有关键词出现的次数之和;Among them, TF( W'm ,d' n ) is the word frequency of the mth keyword W'm relative to the archived file d' n , count( W'm ,d' n ) is the keyword W'm in the archived file d'n. The number of occurrences in the archived file d' n , count(d' n ) is the sum of the occurrences of all keywords in the archived file d'n;

通过下式获得每个文件的训练集的已归档文件的每个关键词的逆词频;The inverse word frequency of each keyword in the archived files of the training set of each file is obtained by the following formula;

Figure BDA0002469602220000093
Figure BDA0002469602220000093

其中,IDF(W'm)为关键词W'm的逆词频,dm为出现关键词W'm的已归档文件,count(dm)为出现出现关键词W'm的已归档文件的数量;Among them, IDF(W' m ) is the inverse word frequency of the keyword W' m , d m is the archived file where the keyword W' m appears, and count(d m ) is the number of archived files where the keyword W' m appears. quantity;

通过下式获得每个文件的训练集的已归档文件的每个关键词的词频-逆词频,作为所述关键词的特征值,每一个已归档文件的各个关键词的特征值构成特征向量The word frequency-inverse word frequency of each keyword of the archived files of the training set of each file is obtained by the following formula, as the feature value of the keyword, and the feature value of each keyword of each archived file constitutes a feature vector

TFIDF(W'm,d'n)=TF(W'm,d'n)*IDF(W'm)TFIDF( W'm ,d' n )=TF( W'm ,d' n )*IDF( W'm )

其中,TFIDF(W'm,d'n)为关键词W'm在已归档文件d'n的词频-逆词频。Among them, TFIDF( W'm ,d' n ) is the word frequency-inverse word frequency of the keyword W'm in the archived file d' n .

另外,优选地,还包括:In addition, preferably, it also includes:

对正确历史归档文件数据进行划分,一部分作为分类模型的训练集,对模型参数进行训练,一部分作为分类模型性能测试部的测试集;Divide the correct historical archive file data, and use a part as the training set of the classification model to train the model parameters, and part as the test set of the performance testing department of the classification model;

通过测试集对分类模型进行性能评估和判定,具体地:The performance of the classification model is evaluated and judged through the test set, specifically:

根据下式评估分类模型文件归档的准确率P,The accuracy rate P of the classification model file filing is evaluated according to the following formula,

Figure BDA0002469602220000094
Figure BDA0002469602220000094

根据下式评估分类模型文件归档的召回率R,The recall rate R of the classification model file filing is evaluated according to the following formula,

Figure BDA0002469602220000101
Figure BDA0002469602220000101

根据下式评估分类模型文件归档的的F参数Evaluate the F-parameters of the classification model file according to

Figure BDA0002469602220000102
Figure BDA0002469602220000102

其中,β是准确率P和召回率R之间的权衡因子。where β is a trade-off factor between precision P and recall R.

在一个可选实施例中,可以不对压缩包中每个文件进行归档,可以直接将压缩包进行归档,也就是说将压缩包中的关键词按照出现次数筛选出设定数量的关键词,根据共同关键词的个数筛选出与压缩包相匹配的设定个数的已归档文件作为压缩包的训练集,根据上一个实施例中的步骤获得压缩包归档的分类项。In an optional embodiment, each file in the compressed package may not be archived, but the compressed package may be directly archived, that is to say, the keywords in the compressed package are filtered out according to the number of occurrences to a set number of keywords. The number of common keywords filters out a set number of archived files matching the compressed package as the training set of the compressed package, and the classification items of the compressed package archive are obtained according to the steps in the previous embodiment.

在一个可选是实例中,压缩包的内容项还包括压缩包的授权标识或压缩包中每个文件的授权标识,所述授权标识为客户端的唯一标识,例如,将员工的员工号或身份证号等作为授权标识,当客户端查询归档文件时,将具有客户端授权标识的文件显示到客户端。In an optional example, the content item of the compressed package further includes an authorization identifier of the compressed package or an authorization identifier of each file in the compressed package, where the authorization identifier is the unique identifier of the client, for example, the employee number or identity of the employee. The certificate number, etc., is used as the authorization identifier. When the client queries the archived file, the file with the client authorization identifier will be displayed to the client.

此外,本发明实施例还提出一种计算机可读存储介质,所述计算机可读存储介质中包括文件导入归档程序,所述文件导入归档程序被处理器执行时实现如下步骤:In addition, an embodiment of the present invention also provides a computer-readable storage medium, where the computer-readable storage medium includes a file importing and archiving program, and when the file importing and archiving program is executed by a processor, the following steps are implemented:

以压缩包的形式传输文件,所述压缩包包括一个或多个文件;transmitting the file in the form of a compressed package, the compressed package including one or more files;

读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;Read all the content items of the transmitted compressed package, and identify the valid parameters, the content items include the name of the compressed package, the compressed package format, the naming and type of the files in the compressed package, and the source identifier of the compressed package, and the valid parameters including file naming or/and file type;

根据识别出的有效参数判断文件命名是否符合命名规则;Determine whether the file naming conforms to the naming rules according to the identified valid parameters;

将不符合命名规则的文件进行导入失败及原因的显示;Display the failure of importing files that do not meet the naming rules and the reasons;

将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。需要强调的是,为进一步保证上述压缩包的私密和安全性,上述压缩包还可以存储于一区块链的节点中。Files that conform to the naming rules are archived into corresponding classification items, where the classification items are classification labels classified according to file names. It should be emphasized that, in order to further ensure the privacy and security of the above compressed package, the above compressed package can also be stored in a node of a blockchain.

本发明之计算机可读存储介质的具体实施方式与上述文件导入归档方法、电子设备的具体实施方式大致相同,在此不再赘述。The specific implementations of the computer-readable storage medium of the present invention are substantially the same as the specific implementations of the above-mentioned file importing and archiving method and electronic device, and will not be repeated here.

本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in the present invention is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, device, article or method comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, apparatus, article or method. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.

上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端客户端(可以是手机,计算机,服务器,或者网络客户端等)执行本发明各个实施例所述的方法。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments. From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on such understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM) as described above. , disk, optical disk), including several instructions to make a terminal client (which may be a mobile phone, computer, server, or network client, etc.) execute the methods described in the various embodiments of the present invention.

以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied in other related technical fields , are similarly included in the scope of patent protection of the present invention.

Claims (10)

1.一种文件导入归档方法,其特征在于,包括:1. a file import filing method, is characterized in that, comprises: 以压缩包的形式传输文件,所述压缩包包括一个或多个文件;transmitting the file in the form of a compressed package, the compressed package including one or more files; 读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;Read all the content items of the transmitted compressed package, and identify the valid parameters, the content items include the name of the compressed package, the compressed package format, the naming and type of the files in the compressed package, and the source identifier of the compressed package, and the valid parameters including file naming or/and file type; 根据识别出的有效参数判断文件命名是否符合命名规则;Determine whether the file naming conforms to the naming rules according to the identified valid parameters; 将不符合命名规则的文件进行导入失败及原因的显示;Display the failure of importing files that do not meet the naming rules and the reasons; 将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。Files that conform to the naming rules are archived into corresponding classification items, where the classification items are classification labels classified according to file names. 2.根据权利要求1所述的文件导入归档方法,其特征在于,所述将不符合命名规则的文件进行导入失败及原因的显示的步骤包括:以弹窗的形式,展示出导入失败的全部文件名称及原因。2. the file import filing method according to claim 1, is characterized in that, the described step that the file that does not meet the naming rule to carry out import failure and the display of reason comprises: in the form of a pop-up window, showing all the import failures File name and reason. 3.根据权利要求2所述的文件导入归档方法,其特征在于,所述弹窗包括第一弹窗、第二弹窗和第三弹窗,第一弹窗用于在上传文件的客户端显示导入失败及原因,第二弹窗用于向在上传文件的客户端提供忽略错误并导入文件的选项,第三弹窗用于向客户端提供重新上传的选择。3. The file importing and archiving method according to claim 2, wherein the pop-up window comprises a first pop-up window, a second pop-up window and a third pop-up window, and the first pop-up window is used for uploading files on a client side Displays the import failure and the reason, the second pop-up window is used to provide the client uploading the file with the option to ignore the error and import the file, and the third pop-up window is used to provide the client with the option of re-uploading. 4.根据权利要求3所述的文件导入归档方法,其特征在于,所述将不符合命名规则的文件进行导入失败及原因的显示的步骤还包括:当导入失败的文件少于设定个数时,弹出第二弹窗;当导入失败的文件不少于设定个数时,弹出第三弹窗。4. the file import filing method according to claim 3, is characterized in that, the described step that the file that does not meet naming rule is carried out import failure and the display of reason also comprises: when the file of import failure is less than the set number , the second pop-up window will pop up; when the number of files that fail to import is not less than the set number, the third pop-up window will pop up. 5.根据权利要求1所述的文件导入归档方法,其特征在于,所述压缩包存储于区块链中,所述将符合命名规则的文件归档到对应的分类项的步骤包括:5. The file import filing method according to claim 1, wherein the compressed package is stored in the block chain, and the step of filing the files that meet the naming rules to the corresponding classification items comprises: 按照命名规则提取压缩包中文件的关键词,通过向量空间模型获得压缩包中文件的关键词的特征值,每一个文件的所有关键词的特征值构成所述文件的特征向量;Extract the keywords of the files in the compressed package according to the naming rules, obtain the eigenvalues of the keywords of the files in the compressed package through the vector space model, and the eigenvalues of all the keywords of each file constitute the eigenvectors of the file; 构建基于特征向量的分类模型;Build a feature vector-based classification model; 将压缩包中的文件的特征向量输入分类模型对压缩包中的文件进行预判,并将压缩包中的文件归档到对应的分类项。The feature vector of the files in the compressed package is input into the classification model to predict the files in the compressed package, and the files in the compressed package are archived to the corresponding classification items. 6.根据权利要求5所述的文件导入归档方法,其特征在于,所述构建基于特征向量的分类模型的步骤包括:6. The file import filing method according to claim 5, wherein the step of constructing a feature vector-based classification model comprises: 根据共同关键词的个数筛选出与每个文件相匹配的设定个数的已归档文件,作为每个文件的训练集;According to the number of common keywords, a set number of archived files matching each file are filtered out as the training set of each file; 每个文件的训练集的已归档文件的特征向量构成训练集的特征矩阵,已归档文件的分类项构成训练集的分类项矩阵;The feature vector of the archived files of the training set of each file constitutes the feature matrix of the training set, and the classification items of the archived files constitute the classification item matrix of the training set; 通过所述文件的训练集的特征矩阵和分类项矩阵,获得所述文件的每个关键词归档于每个分类项的关键词归档概率;Obtain the keyword filing probability that each keyword of the file is filed in each category item through the feature matrix and the classification item matrix of the training set of the file; 筛选出关键词归档概率最高的分类项,作为关键词的最佳分类项;Filter out the classification item with the highest keyword archiving probability as the best classification item for the keyword; 根据文件的每个关键词的最佳分类项及其对应的的关键词归档概率获得文件归属于各分类项的文件归档概率,构建了分类模型,According to the best classification item of each keyword of the file and its corresponding keyword archiving probability, the file archiving probability of the file belonging to each classification item is obtained, and a classification model is constructed. 其中,所述将压缩包中的文件归档到对应的分类项的步骤包括:Wherein, the step of filing the files in the compressed package to the corresponding classification item includes: 按照文件归档概率由高到低的顺序对各分类项进行显示,供客户端选择,或者是直接将文件归档到文件归档概率最高的分类项。The classification items are displayed in descending order of the file archiving probability for the client to choose, or the file can be directly archived to the classification item with the highest file archiving probability. 7.根据权利要求6所述的文件导入归档方法,其特征在于,所述每个文件的训练集的已归档文件的特征向量的构建步骤包括:7. The file import filing method according to claim 6, wherein the step of constructing the feature vector of the archived file of the training set of each file comprises: 通过下式获得每个文件的训练集的已归档文件的每个关键词的词频The word frequency of each keyword in the archived files of the training set of each file is obtained by the following formula
Figure FDA0002469602210000021
Figure FDA0002469602210000021
其中,TF(W′m,d′n)为第m个关键词W′m相对于已归档文件d′n的词频,count(W′m,d′n)为关键词W′m在已归档文件d′n中出现的次数,count(d′n)为已归档文件d′n中所有关键词出现的次数之和;Among them, TF(W′ m , d′ n ) is the word frequency of the mth keyword W′ m relative to the archived file d′ n , and count(W′ m , d′ n ) is the word frequency of the keyword W′ m in the archived file d′ n . The number of occurrences in the archived file d' n , count(d' n ) is the sum of the number of occurrences of all keywords in the archived file d'n; 通过下式获得每个文件的训练集的已归档文件的每个关键词的逆词频Obtain the inverse word frequency of each keyword of the archived files of the training set of each file by the following formula
Figure FDA0002469602210000022
Figure FDA0002469602210000022
其中,IDF(W′m)为关键词W′m的逆词频,dm为出现关键词W′m的已归档文件,count(dm)为出现出现关键词W′m的已归档文件的数量,n为筛选出的训练集的已归档文件的个数;Among them, IDF(W' m ) is the inverse word frequency of the keyword W' m , d m is the archived file where the keyword W' m appears, and count(d m ) is the number of archived files where the keyword W' m appears. Number, n is the number of archived files in the filtered training set; 通过下式获得每个文件的训练集的已归档文件的每个关键词的词频-逆词频,作为所述关键词的特征值,每一个已归档文件的各个关键词的特征值构成特征向量The word frequency-inverse word frequency of each keyword of the archived files of the training set of each file is obtained by the following formula, as the feature value of the keyword, and the feature value of each keyword of each archived file constitutes a feature vector TFIDF(W′m,d′n)=TF(W′m,d′n)*IDF(W′m)TFIDF(W' m , d' n )=TF(W' m , d' n )*IDF(W' m ) 其中,TFIDF(W′m,d′n)为关键词W′m在已归档文件d′n的词频-逆词频。Among them, TFIDF(W' m , d' n ) is the word frequency-inverse word frequency of the keyword W' m in the archived file d' n .
8.根据权利要求1所述的文件导入归档方法,其特征在于,所述内容项还包括压缩包的授权标识或压缩包中每个文件的授权标识,所述授权标识为客户端的唯一标识。8 . The file importing and archiving method according to claim 1 , wherein the content item further comprises an authorization identifier of the compressed package or an authorization identifier of each file in the compressed package, and the authorization identifier is a unique identifier of the client. 9 . 9.一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存储有文件导入归档程序,所述文件导入归档程序被所述处理器执行时实现如下步骤:9. An electronic device, comprising a memory and a processor, wherein a file import filing program is stored in the memory, and when the file import filing program is executed by the processor, the following steps are implemented: 以压缩包的形式传输文件,所述压缩包包括一个或多个文件;transmitting the file in the form of a compressed package, the compressed package including one or more files; 读取所传压缩包的全部内容项,识别出有效参数,所述内容项包括压缩包的名称、压缩包格式、压缩包中的文件的命名和类型以及压缩包的来源标识,所述有效参数包括文件命名或/和文件类型;Read all the content items of the transmitted compressed package, and identify the valid parameters, the content items include the name of the compressed package, the compressed package format, the naming and type of the files in the compressed package, and the source identifier of the compressed package, and the valid parameters including file naming or/and file type; 根据识别出的有效参数判断文件命名是否符合命名规则;Determine whether the file naming conforms to the naming rules according to the identified valid parameters; 将不符合命名规则的文件进行导入失败及原因的显示;Display the failure of importing files that do not meet the naming rules and the reasons; 将符合命名规则的文件归档到对应的分类项,所述分类项为根据文件名称分类的分类标签。Files that conform to the naming rules are archived into corresponding classification items, where the classification items are classification labels classified according to file names. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括有文件导入归档程序,所述文件导入归档程序被处理器执行时,实现如权利要求1至8中任一项权利要求所述文件导入归档方法的步骤。10. A computer-readable storage medium, characterized in that, the computer-readable storage medium includes a file import and archive program, and when the file import archive program is executed by the processor, the implementation of any one of claims 1 to 8 is realized. 1. The steps of a method for importing a file as claimed in claim.
CN202010346888.XA 2020-04-27 2020-04-27 File import and archive method, electronic device and storage medium Pending CN111611211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010346888.XA CN111611211A (en) 2020-04-27 2020-04-27 File import and archive method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010346888.XA CN111611211A (en) 2020-04-27 2020-04-27 File import and archive method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN111611211A true CN111611211A (en) 2020-09-01

Family

ID=72204445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010346888.XA Pending CN111611211A (en) 2020-04-27 2020-04-27 File import and archive method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN111611211A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112615885A (en) * 2020-12-29 2021-04-06 格美安(北京)信息技术有限公司 Cross-border transmission method based on directory dynamic control and storage device
CN113220635A (en) * 2021-05-11 2021-08-06 深圳市星火数控技术有限公司 File archiving method, device, equipment and computer readable storage medium
CN113609069A (en) * 2021-07-06 2021-11-05 厦门国际银行股份有限公司 Document management method, system, terminal device and storage medium
CN114153794A (en) * 2021-11-24 2022-03-08 金蝶软件(中国)有限公司 Multiple file upload method and related equipment
CN116150302A (en) * 2022-09-23 2023-05-23 华查智能科技(上海)有限公司 Data supplement method and device based on multi-channel mapping
CN120179463A (en) * 2025-03-12 2025-06-20 广州商之杰信息科技有限公司 An incremental data backup platform and method based on Bayesian model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005352770A (en) * 2004-06-10 2005-12-22 Fuji Xerox Co Ltd Document storage device, document storage system, document storage method and program
CN106250385A (en) * 2015-06-10 2016-12-21 埃森哲环球服务有限公司 The system and method for the abstract process of automated information for document
CN106708926A (en) * 2016-11-14 2017-05-24 北京赛思信安技术股份有限公司 Realization method for analysis model supporting massive long text data classification
CN107368526A (en) * 2017-06-09 2017-11-21 北京因果树网络科技有限公司 A kind of data processing method and device
CN110535890A (en) * 2018-05-23 2019-12-03 杭州海康威视系统技术有限公司 The method and apparatus that file uploads
CN110716895A (en) * 2019-09-17 2020-01-21 平安科技(深圳)有限公司 Target data archiving method and device, computer equipment and medium
CN110781303A (en) * 2019-10-28 2020-02-11 佰聆数据股份有限公司 Short text classification method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005352770A (en) * 2004-06-10 2005-12-22 Fuji Xerox Co Ltd Document storage device, document storage system, document storage method and program
CN106250385A (en) * 2015-06-10 2016-12-21 埃森哲环球服务有限公司 The system and method for the abstract process of automated information for document
CN106708926A (en) * 2016-11-14 2017-05-24 北京赛思信安技术股份有限公司 Realization method for analysis model supporting massive long text data classification
CN107368526A (en) * 2017-06-09 2017-11-21 北京因果树网络科技有限公司 A kind of data processing method and device
CN110535890A (en) * 2018-05-23 2019-12-03 杭州海康威视系统技术有限公司 The method and apparatus that file uploads
CN110716895A (en) * 2019-09-17 2020-01-21 平安科技(深圳)有限公司 Target data archiving method and device, computer equipment and medium
CN110781303A (en) * 2019-10-28 2020-02-11 佰聆数据股份有限公司 Short text classification method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘亮光;曾太;: "基于朴素贝叶斯的法律咨询文本分类方法", 电脑编程技巧与维护, no. 08, 31 August 2018 (2018-08-31), pages 2 - 3 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112615885A (en) * 2020-12-29 2021-04-06 格美安(北京)信息技术有限公司 Cross-border transmission method based on directory dynamic control and storage device
CN112615885B (en) * 2020-12-29 2023-04-18 格美安(北京)信息技术有限公司 Cross-border transmission method based on directory dynamic control and storage device
CN113220635A (en) * 2021-05-11 2021-08-06 深圳市星火数控技术有限公司 File archiving method, device, equipment and computer readable storage medium
CN113609069A (en) * 2021-07-06 2021-11-05 厦门国际银行股份有限公司 Document management method, system, terminal device and storage medium
CN114153794A (en) * 2021-11-24 2022-03-08 金蝶软件(中国)有限公司 Multiple file upload method and related equipment
CN114153794B (en) * 2021-11-24 2025-09-12 金蝶软件(中国)有限公司 Multiple file upload method and related equipment
CN116150302A (en) * 2022-09-23 2023-05-23 华查智能科技(上海)有限公司 Data supplement method and device based on multi-channel mapping
CN120179463A (en) * 2025-03-12 2025-06-20 广州商之杰信息科技有限公司 An incremental data backup platform and method based on Bayesian model

Similar Documents

Publication Publication Date Title
CN111611211A (en) File import and archive method, electronic device and storage medium
CN109947789B (en) Method, device, computer equipment and storage medium for processing data of multiple databases
CN109522746B (en) A data processing method, electronic device and computer storage medium
US20260010750A1 (en) Rf tag operating system with iot connector core
US11048715B1 (en) Automated file acquisition, identification, extraction and transformation
US20220121675A1 (en) Etl workflow recommendation device, etl workflow recommendation method and etl workflow recommendation system
EP1990740A1 (en) Schema matching for data migration
US20210349955A1 (en) Systems and methods for real estate data collection, normalization, and visualization
CN112052396A (en) Course matching method, system, computer equipment and storage medium
WO2021175021A1 (en) Product push method and apparatus, computer device, and storage medium
CN110244934B (en) Method and device for generating product demand document and test information
CN115809241A (en) Data storage method, device, computer equipment and storage medium
WO2025019581A1 (en) Data digitization via custom integrated machine learning ensembles
WO2025019577A1 (en) Data digitization via custom integrated machine learning ensembles
CN114968914A (en) Electronic archive management method and device, computer equipment and storage medium
US20240070319A1 (en) Dynamically updating classifier priority of a classifier model in digital data discovery
CN104412251A (en) System and method for bidirectional search engine and application thereof
US20220405770A1 (en) System and methods for verifying merchandise authenticity
CN119227104A (en) Data processing method, device, equipment and storage medium based on big data analysis
CN117133006A (en) Document verification method, device, computer equipment and storage medium
CN117492752A (en) A page dynamic configuration method, device, computer equipment and storage medium
CN115081447A (en) Requirements document construction method, device, device and storage medium for software development
CN114416572A (en) Form test method, device, computer equipment and storage medium
CN117875983B (en) A traceability method for electronic data of first-time drug sales based on shared transmission
CN111414260A (en) Software system data processing method, device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200901

WD01 Invention patent application deemed withdrawn after publication