HK1104190B

HK1104190B - Digital still camera with audio decoding and coding

Info

Publication number: HK1104190B
Application number: HK07108959.5A
Authority: HK
Inventors: 沈望傅; 道恩．德斯蒙德．许; 陈得伟; 林清芳; 威利．平; 彭莫刚
Original assignee: 创新科技有限公司
Priority date: 2003-12-19
Filing date: 2004-12-17
Publication date: 2010-12-31

Description

Digital still camera with audio decoding and encoding

Technical Field

The present invention relates to digital still cameras (digital still cameras) with audio decoding and encoding, printable audio formats and corresponding methods, particularly but not exclusively to digital still cameras that can be used for decoding previously encoded audio and encoding audio; and a printable audio format of the encoded audio.

Background

There have been many proposals for encoding audio associated with an image or document, such as a photographic image, so that the encoded audio can be printed as an audio format with the image or document, or can be printed as an audio format for application to the image or document. Subsequently, a printed audio format (printed audio format) is scanned using a scanner so that the encoded audio can be decoded and reproduced. This requires a separate scanner that can communicate with the sound reproduction system or that can download the scan data into the sound reproduction system. Such scanners are also prone to errors when reading printed audio formats, as they are manually operated.

Disclosure of Invention

According to a preferred aspect, there is provided a digital still camera including:

(a) a photo imaging system for capturing a single still photo image containing printed material in a printed audio format;

(b) a processor for extracting encoded audio data from the single still photographic image;

(c) a decoder for receiving the encoded audio data and decoding the encoded audio data into an audio signal; and

(d) an audio output for outputting the audio signal as audio.

Data storage means may be provided for storage of at least one of said encoded audio data and said audio signal.

The decoder/encoder may encode and decode using at least one of a short-time fourier transform and code excited linear prediction.

The digital still camera may further include a first light source for generating a first light beam that is directable to a desired position for positioning a lens of the digital still camera at a prescribed distance from the desired position in relation to the printed audio format. Alternatively, the digital still camera further comprises a second light source spaced from said first light source for generating a second light beam orientable to said desired position, said first and second light beams being co-incident at the desired position when said lens is a prescribed distance from said printed audio format.

There may also be a third light source spaced from both said first and second light sources for producing a third light beam orientable to said desired position; the first, second, and third light beams are substantially co-incident at the desired location when the lens is a prescribed distance from and parallel to the printed audio format. Alternatively, all three beams may originate from a single light source and may be focused at three different points.

Still alternatively, there may be an accessory for enabling the lens of the digital still camera to be located at a prescribed distance from and parallel to a desired position related to the printed audio format, the accessory including a bracket to which the camera may be attached so that the camera is located at a fixed position at the prescribed distance above the printed audio format. The bracket may include: a bottom having an opening through which the lens can capture the image; and a plurality of sidewalls extending from the bottom for the prescribed distance. At least one of the plurality of side walls may include at least one light source for illuminating the printed audio format. The at least one light source may be remote from the digital still camera so as to be outside the viewable area of the lens.

The stent may include: a bottom having an opening through which the lens can capture the image; and a plurality of legs extending from the base for the prescribed distance.

The digital still camera may further include a viewfinder including a viewfinder field, the viewfinder field including a plurality of viewfinder indicators for placing an image of the printed audio format in a desired location when the digital still camera is substantially properly positioned relative to the printed audio format.

In another preferred aspect, there is provided a digital still camera for reproducing an audio signal encoded into a printable audio format, the digital still camera comprising:

(d) data storage means for storage of at least one of said encoded audio data and said audio signal.

The digital still camera further includes an audio output for outputting the audio signal as audio.

For both aspects, the digital still camera may further include: an amplifier for amplifying the audio signal; and a converter for converting the digital audio to analog audio. The audio output may be selected from a speaker and an output jack for an earphone or headphone.

The imaging system may also be used to take photographs and include an image capture device. At least one microphone may also be provided for capturing audio signals associated with the photograph; and a converter for converting the analog audio to digital audio. The decoder may also be an encoder for encoding the input digital audio signal into encoded input audio data capable of being printed in a printable audio format.

The processor may embed the encoded input audio data in an associated photograph such that printing the photograph causes the encoded input audio data to be printed therewith as a printed audio format. Instead of embedding the encoded audio data in the associated photograph, the encoded input audio data may be stored in a data storage device with the associated photograph. The digital still camera further includes a printer for printing the print audio format.

The encoded input audio data may be stored separately from the photograph, but with a data connection to the photograph.

The microphone may be selected from the group consisting of built-in to the camera and independent of the camera but operatively connected to the camera.

In another preferred aspect, there is provided a method for reproducing an audio signal encoded into a printed audio format, the method comprising:

(a) positioning a digital still camera adjacent to the printed audio format with a lens of the digital still camera facing the printed audio format, the printed audio format being within a focus range of the lens;

(b) capturing a single still photographic image of the printed audio format in the digital still camera;

(c) processing the single still photographic image in the printed audio format in the digital still camera to produce printed audio format image data;

(d) processing the print audio format image data to obtain an audio signal; and

(e) the audio signal is reproduced as audio.

Processing the print audio format may include:

(a) retrieving an audio label from the printed audio image data;

(b) searching a database of stored audio to obtain stored audio having the same audio tag; and

(c) if a stored audio with the audio tag is found, the stored audio is retrieved and used as the audio signal.

Alternatively, processing the print audio format may include:

(a) extracting encoded audio data from the image;

(b) decoding the encoded audio data into a digital audio signal; and

(c) the digital audio signal is converted to an analog audio signal.

At least one of the encoded audio data and the digital audio signal may be stored.

In another preferred aspect, there is provided a method for sound reproduction of an audio signal encoded into a printable audio format, the method comprising:

(a) capturing a single still photographic image of the printable audio format using a digital still camera;

(b) processing the single still photographic image to extract therefrom a data signal corresponding to the audio signal;

(c) converting the data signal into the audio signal; and

(d) the audio signal is reproduced.

Processing the print audio format may further include decoding the data signal into a digital audio signal, the digital audio signal being converted into the audio signal.

For all methods, both the data signal and the digital audio signal may be stored and both the digital audio signal and the audio signal may be amplified. The decoding is achieved by using at least one of a short-time fourier transform and code excited linear prediction.

The capturing operation may include:

(a) locating at least three central track markers of the printed audio format;

(b) verifying at least three central marks;

(c) locating the remaining marks on the central track; and

(d) and (5) arranging all central marks.

The operation of locating the at least three central markers includes searching for the at least three markers in a central region and, after locating the at least three markers, checking the positions of all remaining markers. A block matching search may be performed.

If the search in the central region fails, the search continues in the upper region, and if the search in the upper region also fails, the search continues in the bottom region. If the search fails in all three regions, blind decoding is performed. The upper zone, the central zone and the bottom zone are predetermined.

(c) retrieving an audio label from the printed audio image data;

(d) searching a database of stored audio to obtain stored audio having the same audio tag; and

(e) if the stored audio with the audio tag is found, the stored audio is retrieved and is reproduced.

Another aspect provides a printable audio format comprising:

(a) printing and coding the audio signal;

(b) a plurality of spaced and parallel tracks;

(c) a print encoding of the audio signal between the plurality of tracks.

Each track may comprise at least one marker. There may be a plurality of equally spaced marks in each track. The indicia may be solid or hollow. The marks may be circular. There may be three tracks, including a top track, a center track, and a bottom track.

The print encoding is achieved by using at least one of a short time fourier transform and code excited linear prediction. Alternatively or additionally, encoding may be performed by using a short-time fourier transform to produce a plurality of frames, with every other frame being deleted before printing.

The transversely aligned marks in each of the plurality of tracks can be used to determine the position of all other marks. The central track may have markings encoding digital data. The digital data may include an audio tag.

The print encoding may be implemented using gray levels, each point of which has at least one white guard bit in a cell configuration. The cell configuration may be a 2 x 2 cell, with the point being in one segment of the 2 x 2 cell, all other segments being used for guard bits. Alternatively, the cell configuration may be a 1 × 2 cell, without horizontal guard bits.

Frequencies in the range 0 to 125Hz may be removed from the audio signal prior to encoding.

The printing code may include: a first part, wherein the encoding is performed in a first direction; and a second part, wherein the encoding is performed in a second direction.

The print audio format may also be arranged in a plurality of parts, the connection order of the plurality of parts being contained in the header data of each part.

The key audio data is closer to the central track.

Alternatively or additionally, the printable audio format may include a central mark and a printed encoding of an audio signal centered on and arranged around the central mark. The printed encoding of the audio signal includes a plurality of short-time Fourier transform audio frames arranged radially in columns, with lower frequencies in a radially outer portion of each column and higher frequencies in a radially inner portion of each column. The audio frame may be used for magnitude only.

The second to last preferred aspect provides a photograph comprising an image and the printable audio format described above. The printable audio format may be a portion of the image, or in an edge around the image. The printable audio format may be on a self-adhesive label for attachment to one of a photograph, a album page containing a photograph, and a picture frame containing a photograph.

According to a final aspect, there is provided a method for luminance (1 metric) smoothing, comprising:

(a) determining a brightness of the extracted mark of the print audio format; and

(b) changing a brightness (brightness) level of an area surrounding the extracted mark to obtain a more uniform illumination of the printed audio format.

Forming a mesh of extracted tokens prior to step (a), wherein the extracted tokens constitute vertices of the mesh. The luminance may be determined by interpolating from the extracted tokens.

Drawings

In order that the invention may be fully understood and readily put into practical effect, there shall now be described by way of non-limitative example only preferred embodiments of the present invention, the description being with reference to the accompanying illustrative drawings in which

FIG. 1 is a front view of the first embodiment;

FIG. 2 is a rear view of the first embodiment;

FIG. 3 is a block diagram of the embodiment of FIG. 1;

FIG. 4 is a front view of a photograph having a printed audio format;

FIG. 5 is a side view of a first embodiment for capturing a print audio format;

FIG. 6 is a front view of the second embodiment;

FIG. 7 is a front view of the third embodiment;

FIG. 8 is a front view of the fourth embodiment;

FIG. 9 is a front view of the fifth embodiment;

FIG. 10 is a front view of the sixth embodiment;

FIG. 11 is a front view of the seventh embodiment;

FIG. 12 is a front view of a preferred form of printed audio format;

FIG. 13 is an enlarged view of a portion of the print audio format of FIG. 11;

a fourth embodiment;

FIG. 14 is a diagram of a first form of cell configuration;

FIG. 15 is a diagram of a second form of cell configuration;

FIG. 16 is an illustration of a second form of a print audio format;

FIG. 17 is an illustration of a third form of a print audio format;

FIG. 18 is a flowchart for capturing images and playback audio in a print audio format;

FIG. 19 is a flowchart for capturing audio and creating a print audio format;

FIG. 20 is 3 illustrations of a central token search; and

FIG. 21 is an illustration of four examples of refinement thresholds;

FIG. 22 is an illustration of a sample position for testing whether the indicia are solid or hollow;

FIG. 23 is an illustration of an alternative marker configuration;

FIG. 24 is an illustration of a radial configuration;

FIG. 25 is a graphical representation of luminance smoothing; and

fig. 26 is an illustration of an eighth embodiment.

Detailed Description

Referring to fig. 1 to 3, a digital still camera 10 is shown. Although a simple form of digital still camera is shown, the present invention is applicable to all forms of digital still cameras including single-lens reflex cameras, digital moving picture cameras in still camera mode, mobile phones having digital camera functions, and personal digital assistants having digital camera functions, and therefore the term "digital camera" should be construed accordingly.

The camera 10 has an imaging system, generally designated 12, including a lens 14, a viewfinder 16, a shutter 18, a built-in flash memory 20, a shutter switch 22, and other control devices 24. The camera 10 has therein: an image capture device 36, such as a charge coupled device; a processor 26 for processing image data received in a known manner; a memory 28 for storing each image as image data; and a controller 30 for controlling the data sent for display on the display 32. The processor 26 performs conventional digital photographic image processing, such as compression and formatting of the captured photographic images. The imaging system 12, including the image capture device 36, is capable of taking and capturing photographic images of everyday scenes. The imaging system 12 may have fixed or variable focus, zoom, and other functions found in digital still cameras.

The camera 10 can be used to capture, extract and reproduce audio from the printed audio format 48 (FIG. 4). The camera 10 is oriented in a manner to be described below toward the print audio format 48, and the imaging system 12 is used to capture images of the print audio format 48. The processor 26 extracts the encoded audio data from the print audio format 48 and passes the encoded audio data to the decoder 34 for decoding the audio data. The decoder 34 receives and decodes the encoded audio data to give audio data.

The decoder 34 sends the decoded audio to an amplifier 38 for analog audio amplification so that it can be output by a sound reproduction device 40. Amplifier 38 may comprise a common form of analog-to-digital converter for converting digital audio for reproduction, or amplifier 38 may be part of converter 37, as shown.

The processor 26 may be separate from or integrated with the decoder 34 and/or the amplifier 38. The sound reproduction means 40 may be a speaker 42 and/or an earphone/headphone jack 44.

In addition, the digital still camera 10 may have a built-in microphone 46 to enable the camera 10 to capture and store audio simultaneously or substantially simultaneously with taking a picture. The audio may be stored in a database in the storage means 47 for subsequent processing and possible subsequent reproduction. When stored in the storage device 47, the audio tag is appended to the audio as an identifier so that the audio can be found when needed. The output of the microphone 46 is encoded in the codec 34 and then sent for printing. The printing may be performed by the printer 41 built in the camera 10 or by a separate printer. When printed, the encoded audio is in a print audio format. The output of the microphone 46 may be converted from analog to digital in an analog-to-digital converter. The analog-to-digital converter may be part of the converter 37, making the converter 37 a digital-to-analog and analog-to-digital converter.

Preferably, the lens 14 of the camera 10 is capable of focusing at a close distance (e.g., 4 cm). To assist in this, one of the control devices 24 may be used for macro setting (macro setting) or may be used for dedicated setting for capturing images of the printed audio format 48.

FIG. 4 shows a document, print photo, or other image-bearing print object 42 having an image 44 or other data thereon. Fig. 44 may occupy all of one surface of photograph 42, or preferably have an edge 46. A printed audio format 48 containing encoded audio is located on the photograph 42 and preferably in the margin 46. The print audio format 48 may be located behind the photograph 42 if needed or desired.

To keep lens 14 a suitable distance from print audio format 48, an accessory or built-in facility may be used to assist. Fig. 6 to 10 show these situations. Fig. 6 illustrates the use of a single light source 50 built into the camera 10. Alternatively, it may be separate from the camera 10, but removably attached to the camera 10. The light source 50 may be a narrow angle LED or low power laser directed at or aimed at the photograph 42, specifically, at a prescribed location on or adjacent to the printed audio format 48, such as the end, center, etc. The camera 10 may be moved toward or away from the printed audio format 48. The processor 26 may continuously evaluate the incoming video stream received through the lens 14 to obtain an effective print audio format 48. The problem with this approach is that: additional computational requirements for continuous evaluation, uncontrolled lighting, and perspective distortion introduced when lens 14 is not parallel to printed audio format 48.

To solve the problem of continuous evaluation, the embodiment of fig. 7 may be used. There is a second light source 52 spaced from the first light source 50. Further, either or both of light sources 50 and 52 may be built into camera 10 (as shown), or may be separate from or removably attached to camera 10. The light sources 50, 52 are focused so that their beams intersect when the lens 14 is positioned at a prescribed distance from the printed audio format 48. In this manner, the processor 26 need only process the print audio format 48 once.

To address the perspective distortion problem, the embodiment of fig. 8 may be used. This embodiment is the same as fig. 7, but it also uses a third light source 54. The light source 54 is spaced apart from the light sources 50, 52. When the light beams of the three light sources 50, 52 and 54 are focused on a point, the lens 14 is located at the correct distance from the printed audio format 48 and parallel to the printed audio format 48. The light source 54 may also be built into the camera 10 (as shown), or may be separate from or removably attached to the camera 10. Alternatively, the three light sources 50, 52, 54 may be mounted on a bracket (not shown) that is attachable to the camera 10. The holder may also contain a power supply for the light sources 50, 52, 54.

As shown in FIG. 26, there may be two or three light sources at one location 2601 of the camera 10. Each light source has a beam 2602, the beams diverging from each other. For three light sources, light beam 2602 will form the vertices of a triangle on the printed audio format 48. Instead of two or three light sources, one light source 2603 with one or more lenses to form two or three diverging beams 2602 may be used. When the camera 10 is positioned at the correct distance from the printed audio format 48 and parallel to the printed audio format 48, the diverging beam 2602 will be at or adjacent to a corner 2604 of the printed audio format 48.

Fig. 9 shows a further embodiment. There is shown an attachment 56 that will be placed on and/or around the lens 14 and on the photograph 42 so that the lens 14 is directly above, parallel to, and at the correct distance from the printed audio format 48.

The attachment 56 includes a top portion 58 having a central opening 60, the central opening 60 being shaped and sized to allow the lens 14 to pass through (if the lens 14 protrudes from the camera 10) or to allow the lens 14 to be manipulated through the opening. Either way, the size and shape of opening 60 allows lens 14 to capture images of printed audio format 48. At least two opposing faces 62 depend from the top 58, the faces 62 having a height necessary to bring the lens 14 the correct distance from the printed audio format 48. Face 62 may be made of any suitable material and may be solid, transparent, translucent, or opaque. Preferably, there are four mutually perpendicular faces 62.

If desired, accessory 56 may include one or more light sources 64 mounted in one or more faces 62 to provide control over the illumination of printed audio format 48. The light source 64 may be an LED and may be independently powered or may be battery powered by the camera 10. To minimize reflections from smooth or similar surfaces, light sources 64 are preferably diffused by a diffuser or placed as low as possible in face 62 so that they are outside the viewable area of lens 14. To help achieve this, the attachment 56 may be wider than the length of the printed audio format 48. Thus, the faces 62 may have different widths. Preferably, the internal dimensions (width W and depth D) of the accessory 56 are slightly larger than the corresponding dimensions of the printed audio format 48.

In capturing images of the print audio format 48, it is possible to capture more than just the print audio format 48. Some of the content of the photograph 42 surrounding the printed audio format 48 may also be captured. In this case, after the image is captured, the processor 26 either extracts encoded audio data from the entire image, discards the captured non-print audio format data, and then extracts encoded audio data, or extracts data relating to the print audio format from the captured image, and then extracts encoded audio data from the print audio format data.

Fig. 10 shows an alternative form in which the face 62 is replaced by 4 corner legs 68. Each leg 68 is located at the intersection of the faces 62. The legs 68 are preferably the same height and spacing as the faces 62.

Fig. 11 shows a further embodiment. The other parts of the camera 10 are not changed except for the viewfinder 16 frame 11. As shown, the viewfinder 16 has a printed audio format 48 in the field of view of the frame 11. With the framing indications 13 forming part of the framing box 11, the printed audio format 48 is substantially correctly framed when the corners of the printed audio format 48 are covered by the framing indications 13, which reduces perspective distortion and helps capture data in the printed audio format 48 with reasonable accuracy. At least two frame indications 13 are required — one at each end of the printed audio format 48, preferably placed at opposite corners of the printed audio format 48. However, as shown, preferably 4 frame indications 13 are used. The viewfinder 16 may be optical or electronic, such as in an LCD display screen, etc.

If desired, and if the camera 10 has a viewfinder indication, the macro function 13 may only appear in the viewfinder 11 when the camera 10 is in macro mode.

Fig. 12 to 14 show preferred forms of the print audio format 48. The print audio format 48 may be encoded and decoded using a short time fourier transform ("STFT") codec and/or a code excited linear prediction ("CELP") codec. The STFT codec uses continuous gray tones, while the CELP codec uses black and white.

The print audio format 48 is a printout containing audio content 49 and various extracted tags for assisting the camera 10. The markers are arranged in a plurality of tracks (rails), preferably 3 tracks-a top track 70, a central track 72 and a lower track 74. There may be any number of tracks from 1 up, including, for example, 1, 2, 3, 4, 5, etc. The audio content 49 surrounds the central track and is located between the tracks, regardless of the number of tracks. The tracks are generally parallel and equally spaced. As shown, in the case of three tracks 70, 72, and 74, the audio content 49 is located in two regions — one between the top track 70 and the center track 72, and the other between the center track 72 and the bottom track 74.

Rails 70, 72, and 74 preferably:

channel error immunity

Constant rotation

Able to withstand local cropping

Independent of the audio codec, so that the same track can work with both CELP and STFT;

can fit into a 1 ". times.1" area at 360 dpi;

ability to assist in rapid marker extraction;

an audio tag capable of inline storage; and

flexibility and extensibility.

The stored audio is captured audio that is digitally (possibly permanently) stored in a camera audio database. The stored audio may be compressed using CELP or other suitable standard compression (e.g., ADPCM). Each audio clip is stored with a unique tag number. When the stored audio is encoded into a print audio format, an audio tag is encoded in a header of the print audio format. During decoding, and after locating the central track, the header is first decoded and the stored audio tags are extracted. Based on the extracted tags, the processor may look for stored audio in a database. If found, it will play back the stored audio. If not found in the database, it will decode the printed audio.

In this way, if the same camera containing the audio database or the same camera storage device (e.g., flash memory card, memory stick, etc.) is used to capture and playback the audio, if the audio is still stored in the database, it can be found through the audio tag, extracted from the database, and played back directly from the database, thereby eliminating the decoding step.

Each track 70, 72, and 74 includes a plurality of equally spaced and vertically aligned marks 76. As shown, the indicia 76 are preferably circular, but they may be other shapes, such as square, octagonal, oval, and the like. They may be solid 78 or hollow 80. The marks 76 are preferably rotation invariant to enable fast circle detection. By using solid marks 78 or empty marks 80 to encode data bits, the marks double as data storage devices.

The size of the marks 76 and the inter-mark distance are fixed. Thus, only two markers are needed to determine the position of the other markers, simplifying marker detection. A third mark may be used to verify both marks. The data area height may vary, such variation preferably being encoded in the central track 72.

Fig. 20 shows the detection of a marker 76, which begins by searching for 3 markers in the central track 72 in the central region 82-fig. 20 (a). These marks are verified. If the tag is found to be invalid, the search continues in other areas, as described below. In the case where three markers are located, the positions of the remaining markers are predicted. A block matching search is then performed to refine the predicted position. Predicting the location is important because it reduces search time and improves search reliability.

If the search for 3 markers in the predetermined central region 82 fails, the search continues in the upper region 84-FIG. 20 (b). If it also fails, the search continues in the bottom region 86-FIG. 20 (c). If all 3 searches fail, then it is assumed that the printed audio format 48 is in the ideal position and blind decoding is performed. Blind decoding assumes that the central track 72 is at the very center of the captured image.

Given 3 central marker positions p0, p1, and p2, where p0 is to the left of p1 and p1 is to the left of p2, the next predicted left marker position is pfleftpredicted ═ p0+ (p0-p 1). The position is refined by performing block matching, pLeft ═ blockmatch (pleftpredicted).

The new position is then used to predict the marker to its left. When no more tokens 76 can be found, the search terminates. A similar technique is then used to search for the marker 76 to the right of these 3 markers.

All the marks in the central track 72 are sorted from left to right.

Given the central track 72, the markings 76 of the peripheral tracks 70 can be predicted.

1. Let the leftmost 4 markers 76 in the central track 72 be p0, p1, p2, and p 3;

2. the direction vector is given by dir-p 3-p 0.

P3 is used instead of p1 and p2 because p3 is farther from p1 and therefore the correlation error due to imperfect block matching is also magnified to a minimal extent.

3. The vector is rotated 90 ° counterclockwise, resulting in a dirUp rotate90 (dir);

4. normalized, dirUpN ═ normaize (dirup);

5. the leftmost top rail 70 marker 76 has an estimated position:

pTopLeftMarkerPredicted＝p0+dirUpN×DistanceToTopRail；

6. searching for optimal location to obtain

pTopLeftMarker＝BlockMatch(pTopLeftMarkerPredicted)。

If the ptopeliftmarker is outside the image, the position is not refined; and

7. the next top track 70 mark 76 can be estimated using ptopeliftmarker + (dir/3).

These steps are repeated for the bottom rail 74. The only change is a 90 rotation, which is clockwise for the bottom rail 74.

For the central track 72 it is determined whether the type of mark 76 is solid 78 or hollow 80 in order to extract the encoded digital data. This is the head. Based on the head, the distance to the peripheral (top 70 or bottom 74) track is known. With this information, it is possible to predict where the peripheral marker is likely to be. If the prediction result is within the image, it will be fine-tuned by the search. If the predicted position is outside the image boundary (due to cropping), the predicted position is used for data extraction.

Given a rectangular region R and a template T of a sample image as a marker, a correlation search is performed from left to right, from top to bottom. This can be coupled with minimum distance clustering (minimum distance clustering) by using heuristics:

for x from left to right in region R

For y from top to bottom in region R

Performing a correlation calculation on the template T at the position (x, y) to obtain a correlation value c

If (c reflects a very high correlation)

Taking the Hamming distance d to obtain the previous mark

If (d < MinDistance)

If c is better, replace the previous mark

Otherwise

A new marker is added.

Store its location and c

MinDistance is a fixed distance between two markers 76, and the "correlation calculation" may use a sum-of-absolute-difference approximation, which is typically implemented in video processors used for block matching in MPEG motion estimation.

Verifying the mark 76 in the central track 72 requires:

1. at least 3 markers 76;

2. the distances between the markers 1, 2 and 2, 3 should be the same; and

3. the angle between the line with the end points at markers 1, 2 and the line with the end points at markers 2, 3 should be the same to ensure that they are collinear.

If either of the above conditions is not satisfied, the input image is discarded as an invalid image. This is useful to minimize the computation time for continuous video stream analysis.

3 tracks 70, 72 and 74 are used, but the extraction operation only requires the central track 72. The missing or partial peripheral tracks 70, 74 reduce the quality of the STFT encoding but do not result in an inability to decode. However, CELP coding requires the accuracy provided by all 3 tracks 70, 72, and 74. Peripheral rails 70, 74 may be used to improve accuracy. They may also be used for lens distortion correction because the central track 72 is generally near the optical center of the lens 14, with minimal distortion and defocus. The central track 72 may also be used to quickly discard invalid print audio formats 48. Three tracks 70, 72 and 74 provide greater header data capacity than one central track 72.

The height and length of the print audio format 48 are variable. The length may be varied to provide more bits to the stored audio tag.

The configuration of the print audio format 48 may be as follows:

inter-mark space, Mg

Top mark diameter, Md O

Gap 1, G1

Top data, Dt xxx

×××××××××

Gap 2, G2

Center mark ∘ -

Gap 3, G3

Bottom data Db xxx

×××××××××

Gap 4, G4

Bottom marks O

Mark ═ mark

Data of X ═ data

The following attributes are derived from the parameters of the print audio format 48.

Center to top peripheral track Md + G1+ G2+ Dt

Center to bottom peripheral track Md + G3+ G4+ Db

Groove 1 ═ ceil (Md/2) + G1

Groove 2 ═ - (floor (Md/2) + G2)

Groove 3 ceil (Md/2) + G3

Groove 4 ═ - (floor (Md/2) + G2)

The center rail 72 to peripheral rails 70, 74 are used to predict the position of the outer rails 70, 74. Trenches 1 to 4 are used during data extraction.

The audio tags may be encoded in the central track 72 using solid marks 78 and hollow marks 80 (representing "0" and "1", respectively). The two types of indicia 78, 80 are distinguished by comparing the center of the indicia with adjacent colors. A threshold is used to find and test the difference to determine whether it is a hollow mark 80 or a solid mark 78. The use of the difference gives better tolerance to illumination variation, as shown in fig. 21, where:

+ is the mark center; and is

x is the neighboring pixel. The "x" is centered on the edge of the hollow marker, for example, if the marker diameter is 7 (radius 3.5), the distance would be about 2.

The analysis is done from left to right so that the least significant bit is to the left. This allows the length of the last field (audio label) to vary with the length of the printed audio format 48.

One configuration may be:

position of bit	Description of the invention
position of bit	Description of the invention	0	Retention, always 1
1	Reserved, always 0	0	Retention, always 1
1	Reserved, always 0	2	First stamp indicator 0-additional stamp 1-first stamp

Position of bit	Description of the invention
Position of bit	Description of the invention	3，4	Type of coding (e.g. STFT or CELP)
5	Cell configuration 0-1X 21-2X 2	3，4	Type of coding (e.g. STFT or CELP)
5	Cell configuration 0-1X 21-2X 2	6-8	Index of additional print tones (for multiple print tones) for the first print tone, it is the number of additional print tones.
9-31	Stored audio tags	6-8
9-31	Stored audio tags	32	The last bit is set to 0, which is in phase with bit 0
	And conversely, for detecting a reversed stamp.	32	The last bit is set to 0, which is in phase with bit 0

As shown in fig. 14, these bits are preferably printed in a 2 × 2 cell 1300 with 3 guard bits as follows:

bit (1301) protection bit (1302)

Protection bit (1304) protection bit (1303)

The bits may be black or white to represent a "1" or a "0", respectively, or may be a continuous gray tone. The guard bits are white spaces and allow for dot gain (dot gain). Using cells without guard bits will result in an increase in bit errors. The data area 49 may have 82 cells per column and 5 columns per segment. Data flows from left to right, from top to bottom.

The black dots spread out over its edges due to paper absorption and imperfect printing. This problem is addressed by using the protection bits (1302, 1303, 1304) in the 2 × 2 cell configuration 1300. The guard bits assume less than one pixel is diffused, resulting in a clear output. However, further degradation may occur, for example, due to lens imperfections during capture, further contaminating the image.

Dots from an inkjet printer spread out of their pixel edges, resulting in dot gain. The ability of the paper to absorb and limit its diffusion to adjacent areas is critical to minimizing dot gain. If a droplet of 300dpi is diffused to a surrounding pixel, it is only equivalent to 150 dpi; if it spreads to two surrounding pixels, it only amounts to 75 dpi. For example, if two black dots are printed with a white dot left between them, the dot gain causes the white dot to appear gray, or black in many cases. The coated paper resists absorption, while the uncoated paper allows for greater absorption, thereby exhibiting greater gain. Other factors that affect dot gain are ink viscosity, fringing (rimming), and mechanical imperfections. These factors are not controlled to allow for variation. This can accommodate various printers. What can be controlled is where the dots are placed to minimize interference, as well as the type of ink and paper used.

Dot gain may cause the printout to appear darker, thereby increasing decoding errors. The standard practice is to compensate for the point gain by using gamma (gamma) corrected or calibrated corrected "highlighted" source images. The advantage of gamma correction is that the dynamic range is maintained. The disadvantage is that the gain is non-linear and may cause distortion if used excessively. The point gain control can be improved empirically by performing additional gamma correction prior to combining or compositing the point gain control with the image. In this case, the additional gamma correction only affects the print audio format 48, not the image 44. The printer driver may then perform its standard gamma correction on the entire document 42.

The horizontal point gain may be more than the vertical point gain. This may be due to the horizontal movement of the print head, whereby the dots move at a horizontal speed before hitting the paper, causing horizontal smudges. By utilizing a small vertical point gain, the vertical guard bits can be removed and a 1 × 2 cell is used.

CELP encodes 16-bit 8Khz speech into a 4800bps stream in a 30ms frame. Each CELP frame has 144 bits or 18 bytes. Reed Solomon forward error Correction codes (Reed Solomon forward error Correction codes) may be appended to the blocks of the CELP frame:

a CELP block consists of 25 columns of 82 cells, with a total of 82 × 25 bits being 256.25 bytes;

only 255 bytes per block are used, leaving 10 bits unused;

each block can accommodate 12 CELP frames, occupying 12 × 18-216 bytes; and

the remaining 39 bytes are occupied by Reed-Solomon code.

The characteristics of the CELP print audio format are:

each print audio format is independent;

for an audio duration of 2.5 seconds, the size of the printed audio format is approximately 1 "x 0.5";

each block can be decoded independently and tolerates 19 single byte errors;

a block is an autonomous unit of data;

82 cells per column;

the printed audio format has a capacity of 14,350 bits or about 1.8KB, and has 175 columns of 82 cells.

The effective capacity of the subtractive error correction code is about 1.5 KB.

Error tolerance refers to a single block. However, some blocks at the edges of the image away from the optical center are more prone to errors than blocks in the middle based on channel errors. Higher error tolerance can be used for these high risk blocks, but this reduces capacity. One alternative is to interleave the data on the print audio format 48. The advantage of interleaving is that the capacity remains unchanged. The disadvantage is that the entire print audio format 48 must be extracted before decoding can begin. If there is less of one side of the printed audio format in the image, the entire printed audio format may be unusable. Interleaving some blocks rather than the entire print audio format 48 may provide a better solution.

In STFT coding, 8-bit 8000Hz speech is first converted to STFT using the following parameters:

256 window sizes;

hanning windowing function;

128 hop size.

The data is then converted to magnitude and phase, so that there are 128 magnitudes and 128 phases per frame. 128 magnitudes are encoded and 128 phases are discarded. In our co-pending PCT and us patent application entitled "Method and System to Process a digital Image" filed contemporaneously herewith, and based on us provisional application 60/531,029 filed on 19/12/2003 and claiming priority thereto, the entire contents of which are incorporated herein by reference, a Method is disclosed that provides an efficient and fault-tolerant Process for reconstructing phase during decoding.

128 8-bit values are encoded using a single column, so that each column protects a complete frame. As described above, the cell configuration is 2 × 2, including one data bit and 3 guard bits, which is the same as the configuration of CELP. Unlike digital CELP, data is presented in continuous gray tones rather than black or white. The color may be reversed so that 255 becomes pure black and 0 becomes white.

To improve error tolerance, the STFT bands may be divided into 3 groups and the more important lower bands may be placed closer to the center track 72. For 82 cells per column of STFT, the bands 32 to 72 may be from the center rail 72 to the bottom rail 74; bands 0 through 31 and bands 72 through 81 may be from center rail 72 to top rail 70. This principle can be extended to 128 cells per column.

If 128 cells per column are used, the size will increase. To reduce the size, 82 cells are coded so as to have the same size as when CELP is used. Reserving only 82 bands out of 128 bands means that the frequency range is only up to 2.6kHz instead of 4 kHz.

By discarding the first four frequency bands (0 to 125Hz), the frequency range can be increased because small loudspeakers cannot reproduce them. From the 48 th band up, every other band is encoded because the power of higher frequencies is more likely to be weaker than the power of lower bands. This effectively increases the frequency range to about 3.7 kHz. During decoding, linear interpolation is performed from the frequency band around the lost frequency band to obtain the lost frequency band. The first four bands and the last few bands remain zero.

Color inkjet printers can only print solid colors. In most cases, these colors are cyan, magenta, yellow, and black (CMYK). When printing gray "dots," small black dots are printed in a specific pattern in a process called spot dyeing (half tone) to simulate gray. The human eye sees a grey colour rather than a single dot because the dot is too small to be recognised by the human eye resolving at a power of 1 arc minute (arc-minute). Anything less than 1 arc minute is averaged. However, the scanner can recognize more than 1 arc minute, so that in practice a dot can be seen instead of gray.

The process of deriving the grey value given a spotted image is inverse spotting (invershalftoning). The low flux of the camera lens 14 may be used to perform "averaging". Additional averaging may be performed by region sampling during the extraction process. Because of the stippling and use of gray, analog gray tone encoding may require a higher resolution printer.

The camera lens bends the geometry of the image, causing barrel and pincushion distortion. The "resolution" of the lens deteriorates and is defocused away from its center. Lens distortion correction may be performed to correct this condition. This problem can be further mitigated by placing critical data near the central track 72.

The exposure time affects the analog (gray scale) data because it determines not only the "gain" but also the ability to resolve different shades of gray. Erroneous exposure times may lead to either overexposure (overexposure) or underexposure (under-exposure) of the image, and may compress the gray levels, resulting in poor data extraction, especially for analog encoding. Incorrect exposure may result in the audio hearing "thin" due to overexposure, or having excessive "echo" and "distortion" due to underexposure. The use of the illustrated opaque mask covering the printed audio format 48 may help overcome this problem.

Extracting audio data from the print audio format 48 includes linear interpolation from the marker 76 positions to obtain the cell positions.

For digital extraction, the thresholding is done dynamically. The extraction starts with bilinear sampling of all cells of the segment and the average is used as a threshold to convert the gray levels to binary numbers. A segment is surrounded by 4 adjacent marks 76 in a rectangular configuration. This method assumes that the "0" and "1" bits are evenly distributed. This works better than a fixed threshold, especially if the illumination is not uniform. Using the entire print audio format 48 instead of just one segment to derive the average, a more even distribution of "0" and "1" can be obtained, but assuming that the illumination is constant for the entire image, this is likely to be unrealistic. The minimum and maximum extraction values are also stored to refine the threshold.

If the adjacent dots are black, the white dots will be darker. When scanning proceeds from left to right, top to bottom, the threshold is adjusted if the top or bottom dot is black, thereby achieving improved accuracy, especially at out-of-focus regions near the edges of the printed audio format. Similarly, the threshold is adjusted when the surrounding pixels are white. Fig. 21 shows the following adjustments:

left and top pixels are white: new threshold value of 0.1 max +0.9 threshold (fig. 21(a))

The top pixel is black: new threshold value of 0.1 min +0.9 threshold value (fig. 21(b))

Left pixels are black: new threshold value of 0.1 min +0.9 threshold value (fig. 21(c))

Left and top pixels are black: the new threshold value of 0.18 minimum value +0.82 threshold value (fig. 21(d)) weight is determined empirically.

For analog (grayscale) extraction, the average of 9 pixels is taken: one central pixel and 8 surrounding pixels, each offset by half a pixel, in order to obtain a more stable gray value. Bias weights that favor the center pixel may be used to improve its accuracy. The grey values are then inverted and stored for decoding.

Fig. 16 and 17 show the increase in audio duration.

As shown in fig. 16, the print audio format 48 may be bisected along the central track 72. The upper half 90 encodes a first time period of audio, while the lower half 92 encodes a second time period of audio with a reduced frequency band or smaller units. To prevent hearing interruptions (caused by uneven illumination), the lower half 92 may be coded in reverse order at the connection 94 between the upper 90 and lower 92 portions, so that the connection 94 is on the same end of both half portions 90, 92.

Additionally or alternatively, 2 × time scaling may be performed by dropping one frame every other STFT frame. The extraction slows down the 0.5 x time scaling by performing it, thereby performing an "inverse" time scaling. Furthermore, horizontal compression can be used by using 1 × 2 cells, where two guard bits are deleted to give the structure:

bit 3

The protection bit is set to be in a protection state,

thereby halving the space required for each bit.

Fig. 17 shows a case where a plurality of print audios are used. There are four print audios 1601, 1602, 1603, and 1604 here. Their images are preferably captured simultaneously. Alternatively, they may be captured separately as needed or desired. To address the uneven illumination problem within an opaque lens attachment, they alternate in encoding such that the first 1601 is encoded from left to right and the second 1602 is encoded from right to left. The same is true for the last two 1603, 1604. In this way, images of 1601 and 1602 may be captured simultaneously, 1603 and 1604 subsequently. The processor combines all the data into one and uses the header data to control the task. All audio is played back simultaneously.

Fig. 18 shows the entire process for audio reproduction:

-capturing an image (1700)

-extracting data of the print audio format 48 from the image (1701)

-finding 3 central marks by searching (1702)

-extracting header data (1703)

-extract audio tag (1704)

-searching a database for an audio tag (1705)

-if the audio tag is in the database (1706), the audio is extracted (1711), then amplified (1709) and rendered

-if the audio tag is not in the database (1706), the audio is decoded (1707), the decoded audio is then converted (1708) and amplified (1709), or amplified (1709) and converted (1708)

-audio is output (1710)

Fig. 19 shows a process for encoding audio:

-capturing an image (1800)

-recording audio (1801)

-switching audio (1802)

-coded audio (1803)

-tagging audio (1804)

-insert header data (1805)

-storing the encoded data (1806)

-last sent for printing (1807).

If the output (1710) is made for audio reproduction by a camera 10 used in conjunction with a computer, sound system, etc., the conversion (1708) and amplification (1709) may be eliminated.

Fig. 23 shows two alternative formats for the print audio format 48. Fig. 23(a) shows a case where perspective distortion is not taken into account, and therefore only two markers 76 are required per audio content 49.

The markers 76 may be disposed anywhere around or near the periphery of the audio content 49, but are preferably located at or near the corners of the audio content 49.

If perspective distortion correction is to be included, at least 3 markers 76 are required per audio content 49. This is shown in fig. 23 (b). Likewise, the markers 73 are disposed around or near the periphery of each audio content 49 and are mostly located at or near the corners of each audio content 49. This is not necessary, however, and as shown, the marker 76' is located in the middle of one side edge of the audio content 49.

As shown in fig. 24, the audio format 210 may be printed using circular indicia, wherein the central indicia 76 and the individual audio content 49 are arranged concentrically with the indicia 76. The STFT magnitude frames arranged in column 212 may be used to radially expand the marks 76 with the low band at the periphery and to the high band adjacent the marks 76.

Fig. 25 shows luminance smoothing. Here, the brightness of each extracted mark 76 is used to predict the brightness of its surrounding area. By obtaining the predicted brightness, the area around the extracted mark 76 may be darkened or lightened to obtain a more uniform illumination of the printed audio format 48.

First, a mesh 310 is formed in which the intensities of the extracted markers 76 constitute the vertices thereof. The luminance within grid 310 may be estimated by bilinear interpolation from the extracted luminance of marker 76. Higher order interpolation, such as cubic interpolation, may also be used. The resulting luminance map can then be used for brightness flattening.

In fig. 25, solid circles are extracted marks 76, and Li indicates their respective luminances. The luminance value is the average gray value of the extracted mark, or, if it is a color mark, the luminance Li ═ red (0.299 × red) + (0.587 × green) + (0.144 × blue), where red, green and blue are the average red, green and blue of the mark.

In order to correct the luminance at an arbitrary point P, the following processing is performed.

Let Lmax be max (li) be the luminance of the brightest mark.

The luminance Lp at the point P may be interpolated from L1, L2, L3, L4, or more adjacent vertices.

Let Gp be the extracted gray level at point P.

Then, the luminance-corrected gray scale Gp' ═ Gp (Lmax/Lp).

Whilst there has been described in the foregoing description preferred embodiments of the present invention, it will be understood by those skilled in the technology concerned that many variations or modifications in details of design, construction and operation may be made without departing from the present invention.

Claims

1. A digital still camera, comprising:

(d) an audio output for outputting the audio signal as audio.

2. The digital still camera of claim 1, further comprising data storage means for storage of at least one of said encoded audio data and said audio signal.

3. The digital still camera of claim 2, further comprising: an amplifier for amplifying the audio signal; and a converter for converting the digital audio to analog audio for reproduction.

4. The digital still camera of claim 1, wherein the audio output is a speaker or an output jack for a headset.

5. The digital still camera as in claim 1, wherein the photo imaging system is further for taking a photo and comprises an image capture device.

6. The digital still camera of claim 5, further comprising: at least one microphone for capturing an audio signal associated with the photograph; and an analog-to-digital converter for converting the audio signal to an input digital audio signal.

7. The digital still camera of claim 6, wherein said decoder is further an encoder for encoding said input digital audio signal into encoded input audio data printable in a printable audio format.

8. The digital still camera of claim 7, wherein the processor embeds the encoded input audio data in an associated photograph such that printing the photograph causes the encoded input audio data to be printed therewith as a printed audio format.

9. The digital still camera of claim 7, wherein the encoded input audio data is stored in a data storage device with the associated photograph, rather than embedding the encoded audio data in the associated photograph.

10. The digital still camera of claim 7, further comprising a printer for printing said printed audio format.

11. The digital still camera of claim 7, wherein the encoded input audio data is stored separately from the photograph but with a data connection to the photograph.

12. The digital still camera of claim 6, wherein said microphone is built into said camera or is independent of said camera but is operatively connected to said camera.

13. The digital still camera as in claim 7, wherein said decoder/encoder encodes and decodes using at least one of a short time fourier transform and code excited linear prediction.

14. The digital still camera of claim 1, further comprising a first light source for generating a first light beam that is directable to a desired position to position a lens of the digital still camera at a prescribed distance from the desired position in relation to the printed audio format.

15. The digital still camera of claim 14, further comprising a second light source spaced from said first light source for producing a second light beam orientable to said desired location, said first and second light beams being co-incident at the desired location when said lens is a prescribed distance from said printed audio format.

16. The digital still camera of claim 15, further comprising a third light source spaced from both said first and second light sources for producing a third light beam orientable to said desired position; the first, second, and third light beams are substantially co-incident at the desired location when the lens is a prescribed distance from and parallel to the printed audio format.

17. The digital still camera of claim 1, further comprising an accessory for enabling a lens of the digital still camera to be positioned at a prescribed distance from and parallel to a desired position related to the printed audio format, the accessory including a cradle to which the camera can be attached so that the camera is positioned at a fixed position at the prescribed distance above and parallel to the printed audio format.

18. The digital still camera of claim 14, further comprising at least one other light source for providing at least one other light beam, each of the other light beams being directed to one other location when the lens is a prescribed distance from and parallel to the printed audio format, the first light source and the at least one other light source being at one location.

19. The digital still camera of claim 1, further comprising a viewfinder, the viewfinder comprising a viewfinder field, the viewfinder field including a plurality of viewfinder indicators for placing an image of the printed audio format in a desired location when the digital still camera is substantially properly positioned relative to the printed audio format.

20. The digital still camera of claim 17, wherein the stand comprises: a bottom having an opening through which the lens can capture the image; and a plurality of sidewalls extending from the bottom for the prescribed distance.

21. The digital still camera of claim 20, wherein at least one of said plurality of side walls includes at least one light source for illuminating said printed audio format.

22. The digital still camera of claim 21, wherein said at least one light source is remote from said digital still camera so as to be outside the viewable area of said lens.

23. The digital still camera of claim 17, wherein the stand comprises: a bottom having an opening through which the lens can capture the image; and a plurality of legs extending from the base for the prescribed distance.

24. A digital still camera for reproducing an audio signal encoded into a print audio format, the digital still camera comprising:

25. The digital still camera of claim 24, further comprising an audio output for outputting said audio signal as audio.

26. The digital still camera of claim 24, further comprising: an amplifier for amplifying the audio signal; and a converter for converting the digital audio to analog audio.

27. The digital still camera of claim 25, wherein the audio output is a speaker or an output jack for a headset.

28. The digital still camera of claim 24, wherein the imaging system is further for taking pictures and includes an image capture device.

29. The digital still camera of claim 28, further comprising: at least one microphone for capturing an audio signal associated with the photograph; and an analog-to-digital converter for converting the audio signal to an input digital audio signal.

30. The digital still camera of claim 24, wherein said decoder is further an encoder for encoding said input digital audio signal into encoded input audio data printable in a printable audio format.

31. The digital still camera of claim 30, wherein the processor embeds the encoded input audio data in an associated photograph such that printing the photograph causes the encoded input audio data to be printed therewith as a printed audio format.

32. The digital still camera of claim 30, wherein the encoded input audio data is stored in a data storage device with the associated photograph, rather than embedding the encoded audio data in the associated photograph.

33. The digital still camera of claim 30, further comprising a printer for printing said printed audio format.

34. The digital still camera of claim 30, wherein the encoded input audio data is stored separately from the photograph but with a data connection to the photograph.

35. The digital still camera of claim 29, wherein said microphone is built into said camera or is separate from but operatively connected to said camera.

36. The digital still camera of claim 32, wherein said decoder/encoder encodes and decodes using at least one of a short time fourier transform and code excited linear prediction.

37. A method for reproducing an audio signal encoded into a printed audio format, the method comprising:

(d) processing the print audio format image data to obtain an audio signal; and

(e) the audio signal is reproduced as audio.

38. The method of claim 37, wherein said processing said print audio format image data comprises:

(a) obtaining an audio label from the print audio format image data;

39. The method of claim 37, wherein said processing said print audio format image data comprises:

(a) extracting encoded audio data from the single still photographic image;

(b) decoding the encoded audio data into a digital audio signal; and

(c) the digital audio signal is converted to an analog audio signal.

40. The method of claim 39, further comprising storing at least one of the encoded audio data and the digital audio signal.

41. The method of claim 39, further comprising amplifying at least one of the digital audio signal and the analog audio signal.

42. The method of claim 37, wherein the audio output is a speaker or an output jack for a headset.

43. The method of claim 39, wherein said decoding is accomplished by using at least one of a short time Fourier transform and code excited linear prediction.

44. A method for sound reproduction of an audio signal encoded into a printable audio format, the method comprising:

(b) processing the single still photographic image to extract therefrom a data signal corresponding to the audio signal; converting the data signal into the audio signal; and

(c) the audio signal is reproduced.

45. The method of claim 44, wherein the operation of processing the single still photographic image further comprises decoding the data signal into a digital audio signal, the digital audio signal being converted into the audio signal.

46. The method of claim 45, further comprising storing at least one of the data signal and the digital audio signal.

47. The method of claim 45, further comprising amplifying at least one of the digital audio signal and the audio signal.

48. The method of claim 44, wherein the reproduction of the audio signal is achieved through a speaker or an output jack for a headphone.

49. The method of claim 45, wherein the decoding is accomplished by using at least one of a short-time Fourier transform and code excited linear prediction.

50. The method of claim 44, wherein the capturing operation comprises:

(a) locating at least three central track markers of the printable audio format;

(b) verifying at least three central marks;

(c) locating the remaining marks on the central track; and

(d) and (5) arranging all central marks.

51. The method of claim 50 wherein said operation of locating said at least three central markers comprises searching a central area for said at least three markers and, after locating said at least three markers, checking the location of all remaining markers.

52. The method of claim 51, further comprising performing a block matching search.

53. The method of claim 51, wherein if a search in the central region fails, then continuing the search in an upper region, and if a search in the upper region also fails, then continuing the search in a bottom region.

54. The method of claim 53, wherein if the search fails in all three regions, blind decoding is performed.

55. The method of claim 53, wherein the upper region, the central region, and the bottom region are predetermined.

56. A method for sound reproduction of an audio signal encoded into a printable audio format, the method comprising:

(c) retrieving an audio tag from the printable audio format;