CN108304815B

CN108304815B - Data acquisition method, device, server and storage medium

Info

Publication number: CN108304815B
Application number: CN201810128506.9A
Authority: CN
Inventors: 马立
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2021-07-02
Anticipated expiration: 2038-02-08
Also published as: CN108304815A

Abstract

The embodiment of the invention discloses a data acquisition method, a data acquisition device, a server and a storage medium. The method is applied to a server and comprises the following steps: receiving original image data sent by a client and attribute information corresponding to the original image data; and acquiring target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data. The method comprises the steps that a server receives original image data sent by a client and attribute information corresponding to the original image data; and then the server acquires target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data. The data acquisition method, the data acquisition device, the server and the storage medium provided by the embodiment of the invention can not only improve the data acquisition efficiency, but also ensure the accuracy and the safety of the acquired data; and the realization is simple and convenient, the popularization is convenient, and the application range is wider.

Description

Data acquisition method, device, server and storage medium

Technical Field

The embodiment of the invention relates to the technical field of internet, in particular to a data acquisition method, a data acquisition device, a server and a storage medium.

Background

In the operation process of an enterprise, departments such as sales and marketing often need to acquire data in paper media such as user business cards, invoices, contracts, and labels, and then store the data in the paper media into electronic equipment. In the existing data acquisition method, the following two methods are generally adopted: firstly, acquiring data by adopting a manual entry mode. For example, a user's business card typically includes a user's name, gender, home address, telephone, work unit, email address, and the like. The manual input mode is adopted to input the information into the electronic equipment, so that great labor and time are consumed, the data acquisition efficiency is low, and the accuracy of the acquired data cannot be ensured; and secondly, acquiring data by adopting an Optical Character Recognition (OCR) mode. OCR refers to the process of an electronic device (e.g., a scanner or digital camera) examining printed characters on paper, determining their shape by detecting dark and light patterns, and then translating the shape into computer text using character recognition methods. In the prior art, data is generally acquired by using OCR software in local electronic equipment, and the acquired data is only stored in the local electronic equipment, so that the security of the acquired data cannot be guaranteed.

Disclosure of Invention

In view of this, embodiments of the present invention provide a data acquisition method, an apparatus, a server, and a storage medium, which can not only improve data acquisition efficiency, but also ensure accuracy and security of acquired data.

In a first aspect, an embodiment of the present invention provides a data acquisition method, which is applied to a server, and the method includes:

receiving original image data sent by a client and attribute information corresponding to the original image data;

and acquiring target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data.

In a second aspect, an embodiment of the present invention provides a data acquisition apparatus, where the apparatus includes: the device comprises a receiving module and an obtaining module; wherein,

the receiving module is used for receiving original image data sent by a client and predetermined attribute information corresponding to the original image data;

the acquisition module is used for acquiring target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data.

In a third aspect, an embodiment of the present invention provides a server, including:

one or more processors;

a memory for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the data acquisition method according to any embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the data acquisition method according to any embodiment of the present invention.

The embodiment of the invention provides a data acquisition method, a data acquisition device, a server and a storage medium, wherein the server firstly receives original image data sent by a client and attribute information corresponding to the original image data; and then the server acquires target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data. That is to say, in the technical scheme of the invention, the client is only responsible for the acquisition of the original image data, but not for the identification of the original image data; the server is only responsible for the identification of the original image data and not for the acquisition of the original image data. And the server can acquire the target text data corresponding to the original image data in the original image data more quickly according to the attribute information corresponding to the original image data, so that the data acquisition efficiency is improved, and the accuracy and the safety of the acquired data are ensured. The existing data acquisition method adopts a manual entry mode to acquire data or adopts an OCR mode to acquire data. Therefore, compared with the prior art, the data acquisition method, the data acquisition device, the server and the storage medium provided by the embodiment of the invention can not only improve the data acquisition efficiency, but also ensure the accuracy and the safety of the acquired data; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.

Drawings

Fig. 1 is a flowchart of a data acquisition method according to an embodiment of the present invention;

fig. 2 is a flowchart of a data acquisition method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a data acquisition apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a client interacting with a server according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a server according to a fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings.

Example one

Fig. 1 is a flowchart of a data obtaining method according to an embodiment of the present invention, where this embodiment is applicable to a case of obtaining entity data, and the method may be executed by a server, as shown in fig. 1, where the data obtaining method may include the following steps:

and S110, receiving the original image data sent by the client and the attribute information corresponding to the original image data.

In the present embodiment, the client refers to any device that can shoot or scan characters printed on paper, such as a scanner, a digital camera, a smart device with a camera, and the like. The raw image data refers to image data shot or scanned by equipment such as an intelligent terminal. The client acquires original image data, and further, the client shoots or scans characters printed on paper through equipment such as an intelligent terminal and acquires the original image data.

The client and the server establish communication connection through an Application Programming Interface (API) Interface layer. Further, the client is also provided with a storage module, and when the communication connection between the client and the server is disconnected, the original image data is stored in the storage module. When a communication connection is established between a client and a server, original image data stored in the client is transmitted to the server.

Further, before the client acquires the original image, base system information such as enterprise information, business type, CRM type, and the like needs to be configured in a business intelligent dashboard (BI dashboard, dashboard).

Specifically, the attribute information corresponding to the original image data at least includes category information corresponding to the original image data. For example, the category information corresponding to the original image data may be a business card, an invoice, a contract, a label, and the like. Furthermore, attribute information corresponding to the original image data can also be input by a user on an interface of the intelligent terminal.

And S120, acquiring target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data.

In this embodiment, the target text data refers to electronic text data corresponding to paper text information to be entered. And acquiring target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data. Specifically, according to the attribute information corresponding to the original image data, OCR analysis is performed on the original image data to obtain target text data corresponding to the original image data. Illustratively, an enterprise employee needs to enter a labor contract, the employee inputs attribute information from an entry interface as a contract, the camera shoots or scans the paper-version labor contract, performs OCR analysis on the image-shooting-version labor contract, and acquires an electronic-text labor contract corresponding to the image-version labor contract.

The process of performing OCR analysis on the original image mainly comprises the following steps: image input, comparison and identification, manual correction and result storage.

In this embodiment, the image input includes: graying, binaryzation, denoising processing, inclination detection and correction, character feature extraction and receiving original image data sent by a user end; graying the original image data by adopting a weighted average method; carrying out binarization processing on the gray level image; denoising the image to be recognized according to the characteristics of the noise; the binarization processing enables the image to only contain black foreground information and white background information, and the efficiency and the accuracy of the identification processing are improved; the denoising treatment further improves the accuracy of the recognition treatment; inclination detection and correction; detecting an edge line of a paper file frame through Hough Transform (Hough Transform), obtaining a paper file area, and judging the inclination angle of the paper file for correction; the character feature extraction is to convert the original data image into a density matrix, judge the density matrix line by line, connect the continuous '0' elements in the line into a 'detection line', and judge whether the text region is according to the depth, length difference and end position feature of the initial end of the 'detection line'. And marking the pixel points, in which the number of black pixels in the adjacent pixels in 8 directions, namely the upper direction, the lower direction, the left direction, the right direction and the diagonal line of the current pixel, is less than 2, as '0' elements. And establishing a database for comparison, wherein the content of the database comprises all character sets to be identified, and obtaining a characteristic group according to a characteristic extraction method which is the same as that of the input characters.

The alignment identification may include: according to different characteristic characteristics, different mathematical distance functions are selected, such as: the method comprises a Euclidean space comparison method, a Relaxation comparison method (relax) and a Dynamic program comparison method (DP), and a database of a neural network establishes a comparison HMM (high Markov model), etc., because the recognition rate of OCR cannot reach a hundred percent, in order to enhance the correctness and confidence value of the comparison, the compared recognized characters are used for determining possible similar candidate character groups, and the most logical words are found out according to the previous recognized characters and the next recognized characters for performing a correction function.

The manual correction refers to that after comparison and identification, interaction with a user is needed, and manual confirmation and adjustment are performed on possibly existing unidentified information to obtain target text data.

And the result storage means that the target text data is stored at a preset position after manual correction, and whether the original image data is actually stored is selected according to the requirement.

It should be noted that, this embodiment is only a simple description of the process of performing OCR analysis on the original image, and is not limited. The user can design and optimize the OCR analysis process according to actual conditions and requirements.

Further, the data acquisition method may further include: receiving a data operation request sent by a client; the data operation request carries a data identifier of the target text data; and performing corresponding operation on the target text data according to the data identification of the target text data.

In the present embodiment, the data operations include conventional file editing operations such as storage, query, editing, and deletion. The data identification of the target text data is a unique identification for determining the target text data, and can be a file name, a code of the target text data and the like. When enterprise staff need to perform data operation on a certain target file, the target text data is searched under the corresponding type of the client, when the target text data is searched, the data identification of the target text data is read, the data identification of the target text data is added to the data operation request, and the data operation request is sent to the server. And the server receives the data operation request, acquires a data identifier carrying the target text data in the data operation request, and performs corresponding operation on the target text data according to the data identifier of the target text data.

The embodiment of the invention provides a data acquisition method, wherein a server receives original image data sent by a client and attribute information corresponding to the original image data; and then the server acquires target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data. That is to say, in the technical scheme of the invention, the client is only responsible for the acquisition of the original image data, but not for the identification of the original image data; the server is only responsible for the identification of the original image data and not for the acquisition of the original image data. And the server can acquire the target text data corresponding to the original image data in the original image data more quickly according to the attribute information corresponding to the original image data, so that the data acquisition efficiency is improved, and the accuracy and the safety of the acquired data are ensured. The existing data acquisition method adopts a manual entry mode to acquire data or adopts an OCR mode to acquire data. Therefore, compared with the prior art, the data acquisition method provided by the embodiment of the invention not only can improve the data acquisition efficiency, but also can ensure the accuracy and the safety of the acquired data; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.

Example two

Fig. 2 is a flowchart of a data obtaining method provided in the second embodiment of the present invention, and in this embodiment, based on the foregoing embodiments, the data obtaining method is further optimized, and as shown in fig. 2, the data obtaining method may include the following steps:

s210, receiving original image data sent by a client and attribute information corresponding to the original image data.

S220, determining a to-be-analyzed area corresponding to the original image data according to the attribute information corresponding to the original image data.

The attribute information corresponding to the original image data at least includes category information corresponding to the original image data. For example, the category information corresponding to the original image data may be a business card, an invoice, a contract, a label, and the like. Furthermore, attribute information corresponding to the original image data can also be input by a user on an interface of the intelligent terminal.

In this embodiment, the category information corresponding to the original image data is extracted from the attribute information corresponding to the original image data according to a preset sequence; wherein the category information includes: small category information, medium category information, or large category information.

Further, the small category information is any one of subsets of the medium category information, and the medium category information is any one of subsets of the large category information. For example, the large-sized category information may be a contract, the medium-sized category information may be a service-like contract, and the small-sized category information may be a service-like trademark contract or the like.

Specifically, the extracting according to the preset sequence means that the small-size category information, the medium-size category information and the large-size category information are extracted from the attribute information corresponding to the original image data according to the sequence;

further, whether the attribute information corresponding to the original image data contains small-size category information is searched, and if the small-size category information is searched, the small-size category information is extracted from the attribute information corresponding to the original image data; if the small-size category information is not found, whether the medium-size category information is contained is found, and if the medium-size category information is found, the medium-size category information is extracted from the attribute information corresponding to the original image data; if the medium-sized category information is not found, whether the medium-sized category information exists is found, and if the medium-sized category information exists, whether the medium-sized category information exists is found; and if the large-scale type information is found, extracting the large-scale type information from the attribute information corresponding to the original image data.

When small type information is extracted from attribute information corresponding to original image data, determining a first analysis area corresponding to the original image data according to the small type information; when the medium-sized type information is extracted from the attribute information corresponding to the original image data, determining a second analysis area corresponding to the original image data according to the medium-sized type information; when large-scale category information is extracted from the attribute information corresponding to the original image data, determining a third analysis area corresponding to the original image data according to the large-scale category information; wherein the third analysis area comprises a second analysis area; the second resolution area includes the first resolution area.

In this embodiment, the second resolution area is any one of the sub-areas in the third resolution area, and the first resolution area is any one of the sub-areas in the second resolution area.

And S230, performing Optical Character Recognition (OCR) analysis on the original image data in the region to be analyzed of the original image data to obtain target text data corresponding to the original image data.

Further, when small-sized category information is extracted from the attribute information corresponding to the original image data, storing the target text data in a first storage area corresponding to the small-sized category information in the customer relationship management CMR system; when the medium-sized type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a second storage area corresponding to the medium-sized type information in the CMR system; when large-scale type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a third storage area corresponding to the large-scale type information in the CMR system; wherein the third storage area is larger than the second storage area; the second storage area is larger than the first storage area.

The embodiment of the invention provides a data acquisition method, wherein a server receives original image data sent by a client and attribute information corresponding to the original image data; then determining a region to be analyzed corresponding to the original image data according to the attribute information corresponding to the original image data; and performing Optical Character Recognition (OCR) analysis on the original image data in the region to be analyzed of the original image data to obtain target text data corresponding to the original image data. That is to say, in the technical scheme of the invention, the client is only responsible for the acquisition of the original image data, but not for the identification of the original image data; the server is only responsible for the identification of the original image data and not for the acquisition of the original image data. The data acquisition method provided by the embodiment of the invention not only can improve the data acquisition efficiency, but also can ensure the accuracy and the safety of the acquired data; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a data acquisition device according to a third embodiment of the present invention, where this embodiment is applicable to a case of acquiring paper data, and as shown in fig. 3, the device may include the following modules: a receiving module 310 and an obtaining module 320; wherein,

the receiving module 310 is configured to receive original image data sent by a client and attribute information corresponding to the predetermined original image data.

The obtaining module 320 is configured to obtain target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data.

Further, the obtaining module includes: a determining unit and an acquiring unit; wherein,

and the determining unit is used for determining the to-be-analyzed area corresponding to the original image data according to the attribute information corresponding to the original image data.

And the acquisition unit is used for performing OCR analysis on the original image data in the region to be analyzed of the original image data to acquire target text data corresponding to the original image data.

Specifically, the determination unit includes: an extraction subunit and a determination subunit; wherein,

the extraction subunit is used for extracting the category information corresponding to the original image data from the attribute information corresponding to the original image data according to a preset sequence; wherein the category information includes: small-sized category information, medium-sized category information, or large-sized category information; the determining subunit is used for determining a first analysis area corresponding to the original image data according to the small category information when the small category information is extracted from the attribute information corresponding to the original image data; when the medium-sized type information is extracted from the attribute information corresponding to the original image data, determining a second analysis area corresponding to the original image data according to the medium-sized type information; when large-scale category information is extracted from the attribute information corresponding to the original image data, determining a third analysis area corresponding to the original image data according to the large-scale category information; wherein the third analysis area comprises a second analysis area; the second resolution area includes the first resolution area.

Further, the apparatus further comprises: the storage module is used for storing the target text data into a first storage area corresponding to the small and medium category information in the CMR system when the small and medium category information is extracted from the attribute information corresponding to the original image data; when the medium-sized type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a second storage area corresponding to the medium-sized type information in the CMR system; when large-scale type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a third storage area corresponding to the large-scale type information in the CMR system; wherein the third storage area is larger than the second storage area; the second storage area is larger than the first storage area.

Further, the receiving module 310 is further configured to receive a data operation request sent by the client; and the data operation request carries the data identification of the target text data.

The obtaining module 320 is further configured to perform corresponding operations on the target text data according to the data identifier of the target text data.

The data acquisition device provided by the embodiment of the invention firstly receives original image data sent by a client and attribute information corresponding to the original image data; and then the server acquires target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data. That is to say, in the technical scheme of the invention, the client is only responsible for the acquisition of the original image data, but not for the identification of the original image data; the server is only responsible for the identification of the original image data and not for the acquisition of the original image data. And the server can acquire the target text data corresponding to the original image data in the original image data more quickly according to the attribute information corresponding to the original image data, so that the data acquisition efficiency is improved, and the accuracy and the safety of the acquired data are ensured. The existing data acquisition method adopts a manual entry mode to acquire data or adopts an OCR mode to acquire data. Therefore, compared with the prior art, the data acquisition device provided by the embodiment of the invention not only can improve the data acquisition efficiency, but also can ensure the accuracy and the safety of the acquired data; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.

The data acquisition device provided by the embodiment of the invention can execute the data acquisition method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of interaction between a client and a server according to a fourth embodiment of the present invention. The present embodiment may provide a preferable example based on the above-described embodiments. As shown in fig. 4, the CRM system provided by the real-time embodiment of the present invention mainly consists of three parts: a mobile phone terminal 410, a server terminal 420 and a CRM overall system 430.

In this embodiment, the data acquisition method of the CRM system is mainly executed by the client and the server. The client can be installed on an iOS/Android two-size mobile phone platform. The server side provides storage and query of the API and the data content required by the mobile terminal, and meanwhile, the server side is responsible for uniformly constructing the vertical models of different industries, so that uniform and platform service output is provided for various different industries.

In this embodiment, the base system information, such as the enterprise information, the business type, the CRM type, etc., needs to be configured in advance in the Dashbord. An employee using the CRM system in an enterprise installs a client of the CRM system at a mobile phone end, an IT (information technology) person in the enterprise adds information of the employee in the CRM system and opens the use authority of the employee, and an initialized user name and a password are provided for the employee to activate the use authority of the client. Further, the usage rights obtained by employees in different departments of the enterprise may vary.

This staff installs the customer end of CRM system at the cell-phone end and is cell-phone APP, opens APP, carries out user's login according to the initial user name and the password that IT personnel provided, and the user logs in through APP login module, can modify user's password as required after logging in. After a user logs in the APP, a type list which can be input in the industry where the enterprise is located is automatically generated in the APP. The type list recorded in the industry of the enterprise is uniformly constructed by the server. And the server constructs the verticalization models of different industries, and generates the verticalization models required by the enterprise according to the industry of the enterprise to which the employee belongs, namely, the input type list. After a user logs in the APP, the type to be input is selected according to the APP input module, a mobile phone camera is called, required content is photographed to obtain original image data, and the original image data is sent to the server side by the mobile phone client after photographing.

The server receives the original image data sent by the client through the API layer, and obtains the type corresponding to the original image information, wherein the type information comprises: small category information, medium category information, or large category information, as shown in fig. 4, the large category information may include: contract, name card, card license plate, bill and word form. Determining an analysis area and a storage area corresponding to the original image information according to the corresponding type information; and performing OCR analysis in an analysis area corresponding to the original image information, namely a data mart area, storing target text data obtained after the OCR analysis in a storage area corresponding to the original image information, and simultaneously storing the original image data in the storage area. After the analysis is finished, the target text data is sent to the mobile phone client, the staff can check whether the content analyzed by the OCR is correct, and if an error or a wrongly written word exists, the content can be manually modified. The target text data can determine the specific module of the original image in the CRM total system according to the attribute information of the original image, and the specific module is respectively stored in different modules, so that the data management is more convenient. For example: the CRM master system comprises a sales management module, a customer management module, a service management module, a call center module, a market management module, a system management module and the like.

Furthermore, the user can browse any content which is already input through an APP browsing module at the mobile phone end. And searching and inquiring all recorded contents through an APP searching module at the mobile phone end. Furthermore, the user can also edit, delete and copy the stored data according to other modules in the mobile phone terminal.

The server side can provide a unified platform integrating the operation functions of editing, deleting, copying, inquiring and the like for the enterprises, and the collected information can be subjected to customized data processing and is output to specific enterprises; for small and medium-sized enterprises without a CRM system, the universal CRM data platform provided by the system can be directly adopted.

In the client-server interaction system provided by the embodiment, the client can be installed on an iOS/Android two-size mobile phone platform to collect original image data. The server side is responsible for identifying original image data, storing and inquiring API and data content required by the mobile terminal, and simultaneously, the server side is responsible for uniformly constructing vertical models of different industries. The technical scheme provided by the embodiment of the invention not only can improve the data acquisition efficiency, but also can ensure the accuracy and the safety of the acquired data; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.

EXAMPLE six

Fig. 5 is a schematic structural diagram of a server according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary server 512 suitable for use in implementing embodiments of the present invention. The server 512 shown in fig. 5 is only an example and should not bring any limitations to the function and scope of the use of the embodiments of the present invention.

As shown in FIG. 5, the server 512 is in the form of a general purpose device. Components of server 512 may include, but are not limited to: one or more processors or processing units 515, a system memory 528, and a bus 518 that couples the various system components including the system memory 528 and the processing unit 515.

Bus 518 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

The server 512 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by server 512 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 528 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)530 and/or cache memory 532. The server 512 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 534 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 518 through one or more data media interfaces. System memory 528 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 540 having a set (at least one) of program modules 542 may be stored, for example, in system memory 528, such program modules 542 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 542 generally perform the functions and/or methods of the described embodiments of the invention.

The server 512 may also communicate with one or more external devices 514 (e.g., keyboard, pointing device, display 524, etc.), with one or more devices that enable a user to interact with the server 512, and/or with any devices (e.g., network card, modem, etc.) that enable the server 512 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 522. Also, the server 512 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 520. As shown, the network adapter 520 communicates with the other modules of the server 512 via the bus 518. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the server 512, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 515 executes various functional applications and data processing by running programs stored in the system memory 528, for example, to implement the data acquisition method provided by the embodiment of the present invention:

According to the server provided by the embodiment of the invention, the server firstly receives original image data sent by a client and attribute information corresponding to the original image data; and then the server acquires target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data. That is to say, in the technical scheme of the invention, the client is only responsible for the acquisition of the original image data, but not for the identification of the original image data; the server is only responsible for the identification of the original image data and not for the acquisition of the original image data. And the server can acquire the target text data corresponding to the original image data in the original image data more quickly according to the attribute information corresponding to the original image data, so that the data acquisition efficiency is improved, and the accuracy and the safety of the acquired data are ensured. The existing data acquisition method adopts a manual entry mode to acquire data or adopts an OCR mode to acquire data. Therefore, compared with the prior art, the server provided by the embodiment of the invention can not only improve the data acquisition efficiency, but also ensure the accuracy and the safety of the acquired data; moreover, the technical scheme of the embodiment of the invention is simple and convenient to realize, convenient to popularize and wider in application range.

EXAMPLE six

The sixth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data obtaining method provided in all the embodiments of the present invention:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A data acquisition method is applied to a server, and the method comprises the following steps:

receiving original image data sent by a client and attribute information corresponding to the original image data; wherein the attribute information corresponding to the original image data includes category information corresponding to the original image data, and the category information includes: small-sized category information, medium-sized category information, or large-sized category information;

acquiring target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data;

the acquiring, from the original image data, target text data corresponding to the original image data according to the attribute information corresponding to the original image data includes:

extracting category information corresponding to the original image data from attribute information corresponding to the original image data according to a preset sequence, and determining a region to be analyzed corresponding to the original image data according to the category information; extracting small-size category information, medium-size category information and large-size category information from attribute information corresponding to original image data according to a preset sequence;

and performing Optical Character Recognition (OCR) analysis on the original image data in the region to be analyzed of the original image data to obtain target text data corresponding to the original image data.

2. The method according to claim 1, wherein the determining the region to be analyzed corresponding to the original image data according to the category information comprises:

when the small type information is extracted from the attribute information corresponding to the original image data, determining a first analysis area corresponding to the original image data according to the small type information;

when the medium-sized type information is extracted from the attribute information corresponding to the original image data, determining a second analysis area corresponding to the original image data according to the medium-sized type information;

when the large category information is extracted from the attribute information corresponding to the original image data, determining a third analysis area corresponding to the original image data according to the large category information; wherein the third resolution area comprises the second resolution area; the second parsing region includes the first parsing region.

3. The method of claim 2, further comprising:

when the small type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a first storage area corresponding to the small type information in a customer relationship management (CMR) system;

when the medium-sized type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a second storage area corresponding to the medium-sized type information in the CMR system;

when the large-scale type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a third storage area corresponding to the large-scale type information in the CMR system; wherein the third storage area is larger than the second storage area; the second storage area is larger than the first storage area.

4. The method of claim 1, further comprising:

receiving a data operation request sent by the client; wherein the data operation request carries a data identifier of the target text data;

and performing corresponding operation on the target text data according to the data identification of the target text data.

5. A data acquisition apparatus, characterized in that the apparatus comprises: the device comprises a receiving module and an obtaining module; wherein,

the receiving module is used for receiving original image data sent by a client and predetermined attribute information corresponding to the original image data; wherein the attribute information corresponding to the original image data includes category information corresponding to the original image data, and the category information includes: small-sized category information, medium-sized category information, or large-sized category information;

the acquisition module is used for acquiring target text data corresponding to the original image data from the original image data according to the attribute information corresponding to the original image data;

wherein the acquisition module comprises: a determining unit and an acquiring unit; wherein,

the determining unit is used for extracting category information corresponding to the original image data from attribute information corresponding to the original image data according to a preset sequence, and determining a region to be analyzed corresponding to the original image data according to the category information; extracting small-size category information, medium-size category information and large-size category information from attribute information corresponding to original image data according to a preset sequence;

the acquiring unit is used for performing OCR analysis on the original image data in the region to be analyzed of the original image data to acquire target text data corresponding to the original image data.

6. The apparatus according to claim 5, wherein the determining unit comprises at least: determining a subunit; wherein,

the determining subunit is configured to determine, when the small category information is extracted from the attribute information corresponding to the original image data, a first analysis region corresponding to the original image data according to the small category information; when the medium-sized type information is extracted from the attribute information corresponding to the original image data, determining a second analysis area corresponding to the original image data according to the medium-sized type information; when the large category information is extracted from the attribute information corresponding to the original image data, determining a third analysis area corresponding to the original image data according to the large category information; wherein the third resolution area comprises the second resolution area; the second parsing region includes the first parsing region.

7. The apparatus of claim 6, further comprising: the storage module is used for storing the target text data into a first storage area corresponding to the small type information in the CMR system when the small type information is extracted from the attribute information corresponding to the original image data; when the medium-sized type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a second storage area corresponding to the medium-sized type information in the CMR system; when the large-scale type information is extracted from the attribute information corresponding to the original image data, storing the target text data into a third storage area corresponding to the large-scale type information in the CMR system; wherein the third storage area is larger than the second storage area; the second storage area is larger than the first storage area.

8. The apparatus according to claim 5, wherein the receiving module is further configured to receive a data operation request sent by the client; wherein the data operation request carries a data identifier of the target text data;

the acquisition module is further configured to perform corresponding operations on the target text data according to the data identifier of the target text data.

9. A server, comprising:

one or more processors;

a memory for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the data acquisition method of any one of claims 1-4.

10. A storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the data acquisition method of any one of claims 1 to 4.