[go: up one dir, main page]

CN119088801A - Data storage method, indexing method and storage system supporting retrieval - Google Patents

Data storage method, indexing method and storage system supporting retrieval Download PDF

Info

Publication number
CN119088801A
CN119088801A CN202411076066.9A CN202411076066A CN119088801A CN 119088801 A CN119088801 A CN 119088801A CN 202411076066 A CN202411076066 A CN 202411076066A CN 119088801 A CN119088801 A CN 119088801A
Authority
CN
China
Prior art keywords
words
main body
storage
characteristic
storage main
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411076066.9A
Other languages
Chinese (zh)
Inventor
金峰
毛萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhihe Intelligent Technology Development Co ltd
Original Assignee
Zhongneng Shuchuang Tianjin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongneng Shuchuang Tianjin Technology Co ltd filed Critical Zhongneng Shuchuang Tianjin Technology Co ltd
Priority to CN202411076066.9A priority Critical patent/CN119088801A/en
Publication of CN119088801A publication Critical patent/CN119088801A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24547Optimisations to support specific applications; Extensibility of optimisers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of data processing, and particularly discloses a data storage method, an indexing method and a storage system supporting retrieval, which comprise the steps of comparing the content of a single storage main body with a stop word list to determine a plurality of characteristic words in the single storage main body; the method comprises the steps of determining the characteristic capacity parameters of each characteristic word for a storage main body, generating the characteristic words of the storage main body and the characteristic capacity parameters of the characteristic words into an index information table and storing the index information table and the index information table together with the storage main body, determining the characteristic capacity parameters according to the continuous characteristic values and the occurrence times of the characteristic words, considering the correlation between the continuity of the words in the storage main body and the characteristic capacity of the words for the storage main body through the determination of the characteristic capacity parameters, storing data and simultaneously carrying out later retrieval on the data, ensuring effective indexes of corresponding retrieval contents in a database, and further effectively avoiding the occurrence of invalid retrieval phenomena of the data.

Description

Data storage method, indexing method and storage system supporting retrieval
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data storage method, an indexing method, and a storage system supporting retrieval.
Background
In the Internet age, service application is rapidly increased, and the operation of big data is a normal state, so that the operation of massive data and the complexity of data content bring great pressure to the retrieval of the data in a database.
The prior art discloses a data storage and retrieval device which comprises an index file management module, a positioning file management module and a data file management module, wherein the index file management module is used for making index information for data and recording the index information of the data in an index file, the positioning file management module is used for making storage position information for the data and recording the storage position information of the data in a positioning file, and the data file management module is used for storing the data into a data file for retrieval according to the storage file information. By the technical scheme, the method and the device can avoid loading excessive data once in the data application, particularly in the occasion of using mass data, lighten the system pressure and improve the efficiency of data storage, retrieval and recovery.
However, in the above technical solution, the searching efficiency of the data is not ensured and the searching effect is not ensured.
Disclosure of Invention
The invention aims to provide a data storage method, an indexing method and a storage system supporting retrieval, which can store data and simultaneously realize later retrieval of the data, and effectively avoid the occurrence of invalid retrieval of the data.
To this end, in a first aspect, the present invention provides a data storage method supporting retrieval, the data storage method supporting retrieval including:
comparing the content of the single storage body with the stop word list to determine a plurality of characteristic words in the single storage body;
Determining the characterization capability parameters of each feature word for the storage main body;
Generating characteristic words of the storage main body and characterization capability parameters of the characteristic words into an index information table and storing the index information table and the storage main body together;
Under the condition that the words contained in the stop word list are excluded, words with the number of times of occurrence in the storage main body and the ranking of the number of times of occurrence in the storage main body are ranked and the number of times of occurrence in the storage main body is preset;
The characterization capability parameters are determined according to the continuous characteristic values and the occurrence times of the characteristic words, and the characterization capability parameters are respectively in positive correlation with the continuous characteristic values and the occurrence times;
The method comprises the steps of storing a text type storage main body, wherein a continuous characteristic value is a ratio of the maximum continuous field number of continuous appearance of characteristic words to the total field number contained in the storage main body in a plurality of continuous fields with the same interval;
for a storage body of video or audio type, the continuous feature value is the ratio of the maximum continuous period number in which feature words continuously appear to the total period number contained in the storage body within a continuous number of periods of the same interval.
In the preferred technical scheme of the data storage method supporting retrieval, in the step of determining the characteristic capacity parameter of each characteristic word for the storage main body, the product of the continuous characteristic value of the characteristic word and the occurrence frequency of the characteristic word is determined as the characteristic capacity parameter of the characteristic word for the storage main body.
As a preferred technical solution of the data storage method supporting retrieval, under the condition of excluding words contained in the deactivated vocabulary, for determining feature words in the storage subject, in response to the storage subject category being text, performing the following steps:
counting words with occurrence times more than one time in a storage main body;
comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
As a preferred technical solution of the data storage method supporting retrieval, in the case of excluding the words contained in the deactivated vocabulary, for the determination of the feature words in the storage subject, the following steps are performed for audio in response to the storage subject category:
Performing audio character recognition on the storage main body;
counting words with occurrence times more than one time in a storage main body;
comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
As a preferred technical solution of the data storage method supporting retrieval, under the condition that words contained in the deactivated vocabulary are excluded, for determining feature words in the storage subject, performing the following steps for the video in response to the storage subject category;
Respectively carrying out audio character recognition and image character recognition on the storage main body;
counting words with occurrence times more than one time in a storage main body;
comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
As a preferred technical solution of the data storage method supporting retrieval, after determining the characterization capability parameters of the feature words for the storage main body, the method further includes:
carrying out semantic analysis on the feature words of the storage main body, and determining the characteristic capability parameters of a plurality of feature words with the same semantic as the highest numerical value item in the characteristic capability parameters in the feature words with the same semantic.
In a second aspect, the present invention provides an indexing method for searching a database obtained by a data storage method supporting searching in the above solution, including:
determining a storage main body of which the feature words are matched with the retrieval content;
And reading a corresponding index information table, and determining the display sequence of the index information on the storage main body according to the characterization capability parameters of the feature words on the storage main body.
As a preferred technical scheme of the indexing method, determining the display sequence of the index information on the storage main body according to the characterization capability parameters of the feature words on the storage main body comprises the following steps:
determining feature words matched with the retrieval contents by each storage main body;
And taking the sequence from large to small of the sum of the characterization capability parameters of the feature words matched with the retrieval contents of each storage main body as the display sequence of the index information for each storage main body.
In a third aspect, the present invention provides a data storage system, for storing data by applying the above-mentioned data storage method supporting retrieval, including:
the data storage module is used for storing the main body;
The extraction module is connected with the data storage module and is used for extracting the feature words of the storage main body and counting the occurrence times and occurrence nodes of the feature words;
The computing module is connected with the extracting module and is used for computing continuous characteristic values and characteristic capacity parameters of the characteristic words;
And the index support module is respectively connected with the extraction module and the calculation module and is used for generating and storing an index data table comprising the characteristic words of the storage main body and the characterization capability parameters of the characteristic words of the storage main body.
As a preferable technical scheme of the data storage system, the computing module is provided with a semantic analysis unit, and the semantic analysis unit is used for determining feature words with the same semantic meaning and refreshing the characterization capability parameters of the feature words with the same semantic meaning.
The beneficial effects of the invention are as follows:
According to the data storage method supporting retrieval, through determining the characterization capability parameters, the correlation between the continuity of words in the storage main body and the characterization capability of the words on the storage main body is considered, and because in practical application, the situation that the feature words appear in a certain period exists in a concentrated mode, and because the feature words can only characterize the content of the corresponding period under the situation, the characterization of the occurrence times of the feature words is weaker, compared with the fact that the feature words appear more characterizations continuously in a plurality of adjacent periods, the characterization capability parameters determined through the occurrence times and the continuous feature values indicate that the feature words penetrate in the storage main body to a large extent, so that the correlation between the feature words and the storage main body can be better reflected.
Furthermore, the continuous characteristic values of the storage main bodies of different types are respectively determined by adopting the mode of the field-to-total field ratio and the time interval-to-total time interval ratio, so that the storage main bodies of audio, video and text types have good uniformity in response retrieval, the characterization capability parameters have comparability for the storage main bodies of different types, the index information displayed under the condition of retrieving various file types is accurate and reliable, and the effective index of the retrieved content in the database is further ensured.
Drawings
FIG. 1 is a flow chart of a method of data storage supporting retrieval in an embodiment of the invention;
FIG. 2 is a flow chart of an indexing method in an embodiment of the invention;
FIG. 3 is a block diagram of a data storage system in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Wherein the terms "first location" and "second location" are two distinct locations and wherein the first feature is "above," "over" and "over" the second feature includes the first feature being directly above and obliquely above the second feature, or simply indicates that the first feature is level above the second feature. The first feature being "under", "below" and "beneath" the second feature includes the first feature being directly under and obliquely below the second feature, or simply means that the first feature is less level than the second feature.
In the description of the present invention, unless explicitly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected via an intervening medium, or in communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
For ease of understanding, the following terms in the present application are to be interpreted:
stop words, namely, in information retrieval, certain words or words are automatically filtered before or after data processing, and are called stop words, so that storage space is saved and search efficiency is improved.
Referring to fig. 1, the present embodiment provides a data storage method supporting retrieval, where the data storage method supporting retrieval includes:
in step S1, the content of the single storage body is compared with the stop vocabulary to determine a plurality of feature words in the single storage body, and it is understood that the storage body is data to be stored in the database.
Step S2, determining the characterization capability parameters of each feature word for the storage main body;
Step S3, generating the characteristic words of the storage main body and the characterization capability parameters of the characteristic words into an index information table and storing the index information table and the index information table together with the storage main body;
The feature words are words with preset number of ranking before ranking the number of occurrence times in the storage main body under the condition of excluding words contained in the stop word list, and in detail, the stop word list needs to be determined by combining with actual scenes such as language, field and the like, and the establishment of the stop word list is the prior art and is not repeated here.
The characterization capability parameter is determined according to the continuous characteristic value and the occurrence frequency of the characteristic word, and the characterization capability parameter is respectively in positive correlation with the continuous characteristic value and the occurrence frequency;
The method comprises the steps of storing a text type storage main body, wherein a continuous characteristic value is a ratio of the maximum continuous field number of continuous appearance of characteristic words to the total field number contained in the storage main body in a plurality of continuous fields with the same interval;
for a storage body of video or audio type, the continuous feature value is the ratio of the maximum continuous period number in which feature words continuously appear to the total period number contained in the storage body within a continuous number of periods of the same interval.
In the above embodiment, by determining the characterizability parameters, the correlation between the continuity of the words in the storage main body and the characterizability of the words for the storage main body is considered, and because in practical application, the situation that the feature words appear in a certain period exists, and because the feature words can only characterize the content of the corresponding period under the situation, the characterizability of the occurrence times of the feature words is weaker, the feature words continuously appear in adjacent periods and are more characterizability, and the characterizability parameters determined by the occurrence times and the continuous feature values indicate that the characterizability parameters penetrate into the storage main body to a greater extent, so that the correlation between the feature words and the storage main body can be better reflected.
Furthermore, the method and the device for determining the attribute capacity parameters of the storage main bodies of different types respectively determine the attribute capacity parameters of the storage main bodies of different types by adopting a field-to-total field ratio and a period-to-total period ratio, and thus the setting can enable the storage main bodies of audio, video and text types to have good uniformity in response to retrieval, enable the attribute capacity parameters to have comparability to the storage main bodies of different types, ensure accurate and reliable index information displayed under the condition of retrieving various file types, and further ensure effective index of retrieved contents in a database.
In detail, before the determination of the continuous characteristic value, the storage main body is divided into a plurality of fields or time periods with equal intervals, and in order to ensure that the fields or time periods with different lengths have comparability, the dividing number of the fields or time periods is required to be in direct proportion to the length of the fields or time periods, and the dividing number of the fields or time periods is more than 5 segments.
Specifically, in determining the characterizability parameter of each feature word for the storage subject, the product of the continuous feature value of the feature word and the number of occurrences of the feature word is determined as the characterizability parameter of the feature word for the storage subject.
For example, a certain memory body is divided into 100 fields, the length of each field is 200 words, the feature words "data set" appear in the 10 th field to the 19 th field, the feature words "data set" appear in the 15 th field to the 18 th field, the maximum number of continuous fields in which the feature words appear continuously is determined to be 10, the continuous feature value is 0.1, the number of occurrences of the feature words "data set" in the whole memory body is 98 times, and the characterization capability parameter of the feature words for the memory body is determined to be 9.8.
Specifically, with the words contained in the stop word list excluded, for the determination of the feature words in the storage subject, in response to the storage subject category being text, the following steps are performed:
step S101, counting words with occurrence times larger than one time in a storage main body;
step S102, comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
Specifically, with the words contained in the stop vocabulary excluded, for the determination of the feature words in the storage subject, the following steps are performed for audio in response to the storage subject category:
Step S111, performing audio text recognition on the storage main body;
step S112, counting words with occurrence times greater than one time in a storage main body;
And S113, comparing the counted occurrence times of the words, and determining the words with the preset number of ranks before ranking as the characteristic words of the storage main body.
Specifically, in the case of excluding words contained in the stop word list, for determination of feature words in the storage subject, the following steps are performed for the video in response to the storage subject category;
step S121, respectively performing audio character recognition and image character recognition on the storage main body;
Step S122, counting words with occurrence times greater than one time in a storage main body;
and step S123, comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
Of course, in implementation, for the preset number, the data amount contained in the storage main body needs to be adaptively set, the preset number cannot be set too small, so that the preset number of feature words can effectively represent the content of the storage main body, and meanwhile, the preset number cannot be set too large, fewer words irrelevant to the content of the storage main body are ensured to exist in the preset number of feature words, and the waste of storage space and calculation force is reduced.
In the above embodiment, considering that in practical application, the original text description of the audio, video or text type storage main body is not necessarily accurate or effectively used in the practice, the invalid search or the search with poor effect generated by the problem needs to be extracted and determined to be eliminated, and through identifying the characters in the storage process, the influence caused by the phenomenon is avoided, and the retrievability of the database obtained by storage is optimized.
Specifically, after determining the characterizability parameters of the feature words for the storage subject, the method further includes:
Carrying out semantic analysis on the feature words of the storage main body, and determining the characteristic capability parameters of a plurality of feature words with the same semantic as the highest numerical value item in the characteristic capability parameters in the feature words with the same semantic. Optionally, NLP (natural language processing) technology is selected for semantic analysis. Semantic analysis and redetermining the characterizability parameters further improves the retrieval effect.
Referring to fig. 2, the present embodiment further provides an indexing method for searching a database obtained by the data storage method supporting searching in the above solution, including:
Step S01, determining a storage main body of which the feature words are matched with the retrieval content;
step S02, a corresponding index information table is read, and the display sequence of the index information on the storage main body is determined according to the characteristic capacity parameters of the characteristic words on the storage main body.
Specifically, determining the display order of the index information for the storage main body according to the characterization capability parameters of the feature words for the storage main body comprises:
determining feature words matched with the retrieval contents by each storage main body;
And taking the sequence from large to small of the sum of the characterization capability parameters of the feature words matched with the retrieval contents of the storage main bodies as the display sequence of the index information for the storage main bodies.
Illustratively, the retrieved content is "internet data",
The memory main body A is characterized by the characteristic word Internet, and the characterization capability parameter is 0.18.
The main body B is stored with the characteristic word data, and the characterization capability parameter is 0.22.
The storage main body C is characterized by comprising the characteristic word of Internet, the characteristic capability parameter of 0.18, the characteristic word of data and the characteristic capability parameter of 0.22.
Storage body a, sum of characterization capability parameters=0.18.
Memory body B, sum of characterization capability parameters = 0.22.
Storage body C: sum of characterization capability parameters=0.18+0.22=0.40.
The index information is displayed as:
1.a storage body C (0.40);
2. A memory body B (0.22);
3. the storage body a (0.18).
Referring to fig. 3, the present embodiment further provides a data storage system, which uses the above-mentioned data storage method supporting retrieval to store data, including:
the data storage module is used for storing the main body;
the extraction module is connected with the data storage module and is used for extracting the feature words of the storage main body and counting the occurrence times and occurrence nodes of the feature words;
The computing module is connected with the extracting module and is used for computing continuous characteristic values and characteristic capacity parameters of the characteristic words;
The index support module is respectively connected with the extraction module and the calculation module and is used for generating and storing an index data table comprising the characteristic words of the storage main body and the characterization capability parameters of the characteristic words of the storage main body.
As a preferable technical scheme of the data storage system, the calculation module is provided with a semantic analysis unit, and the semantic analysis unit is used for determining feature words with the same semantic meaning and refreshing the characterization capability parameters of the feature words with the same semantic meaning, namely, the characterization capability parameters of a plurality of feature words with the same semantic meaning are all determined to be one item with the highest numerical value in the characterization capability parameters in the feature words with the same semantic meaning.
The modules involved in the embodiments of the present application may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, a processor may be described as including a data storage module, an extraction module, a calculation module, and an index support module. The names of these modules do not constitute a limitation on the module itself in some cases.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It is to be understood that the above examples of the present invention are provided for clarity of illustration only and are not limiting of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (10)

1.A data storage method supporting retrieval, comprising:
comparing the content of the single storage body with the stop word list to determine a plurality of characteristic words in the single storage body;
Determining the characterization capability parameters of each feature word for the storage main body;
Generating characteristic words of the storage main body and characterization capability parameters of the characteristic words into an index information table and storing the index information table and the storage main body together;
Under the condition that the words contained in the stop word list are excluded, words with the number of times of occurrence in the storage main body and the ranking of the number of times of occurrence in the storage main body are ranked and the number of times of occurrence in the storage main body is preset;
The characterization capability parameters are determined according to the continuous characteristic values and the occurrence times of the characteristic words, and the characterization capability parameters are respectively in positive correlation with the continuous characteristic values and the occurrence times;
The method comprises the steps of storing a text type storage main body, wherein a continuous characteristic value is a ratio of the maximum continuous field number of continuous appearance of characteristic words to the total field number contained in the storage main body in a plurality of continuous fields with the same interval;
for a storage body of video or audio type, the continuous feature value is the ratio of the maximum continuous period number in which feature words continuously appear to the total period number contained in the storage body within a continuous number of periods of the same interval.
2. The search supporting data storage method according to claim 1, wherein in said determining the characterizability parameter of each feature word with respect to the storage main body, a product of a continuous feature value of the feature word and the number of occurrences of the feature word is determined as the characterizability parameter of the feature word with respect to the storage main body.
3. The retrieval-enabled data storage method of claim 2, further comprising, after said determining the characterizations capability parameters of the feature words for the storage subject:
carrying out semantic analysis on the feature words of the storage main body, and determining the characteristic capability parameters of a plurality of feature words with the same semantic as the highest numerical value item in the characteristic capability parameters in the feature words with the same semantic.
4. A data storage method supporting retrieval according to claim 3, wherein, in case of excluding words contained in said stop vocabulary, for the determination of feature words in the storage subject, in response to the storage subject category being text, the following steps are performed:
counting words with occurrence times more than one time in a storage main body;
comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
5. The retrieval-enabled data storage method of claim 4, wherein, with the exclusion of words contained in the deactivated vocabulary, for the determination of feature words in the storage subject, the following steps are performed for audio in response to the storage subject category:
Performing audio character recognition on the storage main body;
counting words with occurrence times more than one time in a storage main body;
comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
6. The retrieval-enabled data storage method of claim 5, wherein, with the exclusion of words contained in the deactivated vocabulary, for the determination of feature words in the storage subject, the following steps are performed for the video in response to the storage subject category;
Respectively carrying out audio character recognition and image character recognition on the storage main body;
counting words with occurrence times more than one time in a storage main body;
comparing the counted occurrence times of the words, and determining the words with the preset number of the ranks before ranking as the characteristic words of the storage main body.
7. An indexing method for retrieval of a database obtained by the retrieval-supporting data storage method according to any one of claims 1 to 6, comprising:
determining a storage main body of which the feature words are matched with the retrieval content;
And reading a corresponding index information table, and determining the display sequence of the index information on the storage main body according to the characterization capability parameters of the feature words on the storage main body.
8. The indexing method of claim 7, wherein determining the order in which the index information is presented to the storage subject based on the characterizations of the feature words to the storage subject parameters comprises:
determining feature words matched with the retrieval contents by each storage main body;
And taking the sequence from large to small of the sum of the characterization capability parameters of the feature words matched with the retrieval contents of each storage main body as the display sequence of the index information for each storage main body.
9. A data storage system for storing data using the retrieval-enabled data storage method of any one of claims 1 to 6, comprising:
the data storage module is used for storing the main body;
The extraction module is connected with the data storage module and is used for extracting the feature words of the storage main body and counting the occurrence times and occurrence nodes of the feature words;
The computing module is connected with the extracting module and is used for computing continuous characteristic values and characteristic capacity parameters of the characteristic words;
And the index support module is respectively connected with the extraction module and the calculation module and is used for generating and storing an index data table comprising the characteristic words of the storage main body and the characterization capability parameters of the characteristic words of the storage main body.
10. The data storage system according to claim 9, wherein the computing module is provided with a semantic analysis unit for determining semantically identical feature words and refreshing the characterizability parameters of semantically identical feature words.
CN202411076066.9A 2024-08-07 2024-08-07 Data storage method, indexing method and storage system supporting retrieval Pending CN119088801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411076066.9A CN119088801A (en) 2024-08-07 2024-08-07 Data storage method, indexing method and storage system supporting retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411076066.9A CN119088801A (en) 2024-08-07 2024-08-07 Data storage method, indexing method and storage system supporting retrieval

Publications (1)

Publication Number Publication Date
CN119088801A true CN119088801A (en) 2024-12-06

Family

ID=93699893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411076066.9A Pending CN119088801A (en) 2024-08-07 2024-08-07 Data storage method, indexing method and storage system supporting retrieval

Country Status (1)

Country Link
CN (1) CN119088801A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005085112A (en) * 2003-09-10 2005-03-31 Toshiba Corp Information classification system and program
CN103605665A (en) * 2013-10-24 2014-02-26 杭州电子科技大学 Keyword based evaluation expert intelligent search and recommendation method
CN108255985A (en) * 2017-12-28 2018-07-06 东软集团股份有限公司 Data directory construction method, search method and device, medium and electronic equipment
CN111814770A (en) * 2020-09-04 2020-10-23 中山大学深圳研究院 Content keyword extraction method of news video, terminal device and medium
CN116304104A (en) * 2023-03-23 2023-06-23 上海瑾盛通信科技有限公司 Knowledge map construction method, knowledge map construction device, medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005085112A (en) * 2003-09-10 2005-03-31 Toshiba Corp Information classification system and program
CN103605665A (en) * 2013-10-24 2014-02-26 杭州电子科技大学 Keyword based evaluation expert intelligent search and recommendation method
CN108255985A (en) * 2017-12-28 2018-07-06 东软集团股份有限公司 Data directory construction method, search method and device, medium and electronic equipment
CN111814770A (en) * 2020-09-04 2020-10-23 中山大学深圳研究院 Content keyword extraction method of news video, terminal device and medium
CN116304104A (en) * 2023-03-23 2023-06-23 上海瑾盛通信科技有限公司 Knowledge map construction method, knowledge map construction device, medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王璐: ""连续时间区间内的频繁词序列挖掘算法"", 《计算机工程》, vol. 48, no. 2, 15 February 2022 (2022-02-15), pages 79 - 85 *

Similar Documents

Publication Publication Date Title
US10997184B2 (en) System and method for ranking search results
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
US20210263974A1 (en) Category tag mining method, electronic device and non-transitory computer-readable storage medium
US20160358036A1 (en) Searching for Images by Video
CN109558513B (en) Content recommendation method, device, terminal and storage medium
US8117210B2 (en) Sampling image records from a collection based on a change metric
CN109582470B (en) Data processing method and data processing device
US20040049505A1 (en) Textual on-line analytical processing method and system
AU2020104435A4 (en) Method and apparatus for video recommendation, and refrigerator with screen
CN104599692A (en) Recording method and device and recording content searching method and device
CN113761104B (en) Method and device for detecting entity relationship in knowledge graph and electronic equipment
CN111538903A (en) Method and device for determining search recommended word, electronic equipment and computer readable medium
CN118245568A (en) A question-answering method, device, electronic device and storage medium based on a large model
CN102016782B (en) Operation support device and operation support method
CN110795613A (en) Commodity searching method, device and system and electronic equipment
CN110765325A (en) Operation and maintenance analysis method and system of CEPH distributed storage system
CN119088801A (en) Data storage method, indexing method and storage system supporting retrieval
CN112487240A (en) Video data recommendation method and device
CN119311833A (en) Intelligent question answering method, device, equipment and storage medium
CN111143582A (en) Multimedia resource recommendation method and device for updating associative words in real time through double indexes
KR102732683B1 (en) Apparatus for searching video
CN118568508A (en) Text matching method and text matching device
CN103838765A (en) Contact information storage method and device and electronic equipment
CN119441469B (en) Document recommendation method and related equipment
CN116506691B (en) Multimedia resource processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20250723

Address after: 200000 Shanghai City Minhang District Shen Gui Road 989 Lane No. 2 Building 102 Room

Applicant after: Shanghai Zhihe Intelligent Technology Development Co.,Ltd.

Country or region after: China

Address before: 300192 Tianjin City Nankai District Ke Yan East Road Tianjin Science and Technology Plaza 5-1-901 (Tiankai Park)

Applicant before: Zhongneng Shuchuang (Tianjin) Technology Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right