Zoom Search Engine FAQ - Image indexing (ImageInfo Plugin)

Q. What does the ImageInfo plugin do?
Q. Does this metedata conform to any standards?
Q. Which tags are processed and which aren't?
Q. What are the image file formats that ImageInfo plugin support?
Q. How do I edit meta information in my JPEGs?
Q. How do I turn on image indexing in Zoom?
Q. How do I configure the type of image metadata being indexed?
Q. How do I index specific internal meta information?
Q. How do I link text to to an image to make searching more accurate?
Q. How do I link thumbnail to an image to make searching more accurate?
Q. I do not want Zoom to index certain images, for example images used as thumbnails on my web pages. What can I do?
Q. How do I search for images with specific technical information, for example all images with width 800 pixels shot using a Sony camera?

Q. How do I customize the appearance of images and thumbnails in the search results?

Q. What does the ImageInfo plugin do?

Zoom Search Engine v5.0 introduces a new feature that allows users to search for images such as photographs and diagrams. Searching is carried out by using metadata associated with the file. Image files like JPEGs, PNGs and TIFFs are capable of storing textual data to provide more information about the image as well as technical metadata in the image file that details the photo-taking conditions such as camera make/model, if the flash was on, the shutter speed and aperture value, etc. The ImageInfo plugin extracts this metadata and allows Zoom to index this metadata according to its configuration. Although it is most likely that the technical information is present, it is quite common for the informative metadata to be missing from these files.

Q. Does this metedata conform to any standards?

Yes. Digital cameras save images as specified by the EXIF (Exchangeable Image File) image file format, a standard defined by the Japan Electronics and Information Industries Association (JEITA). The specification uses existing file format such as JPEG (Joint Photographic Experts Group) or TIFF (Tagged Image File Format) with the addition of specific metadata tags.

Further on, a multi-media news exchange format called the Information Exchange Model (IIM) was established to provide additional information, such as caption, news category or dateline. Metadata elements of IIM are quite commonly known as "IPTC headers" of digital image files.

ImageInfo extracts this metadata based on the EXIF and IIM standards. Basically, within the file header, there are segments of markup data (called tags or chunks) that contain information pertaining to image data structure, recording, image data characteristics and hardware. ImageInfo extracts some of the more useful textual meta info and ignores the others.

Q. Which tags are processed and which aren't?

The following table lists the tags that are being processed and the order in which the meta information are prioritized. For example, to get the "Author" meta information for your JPEG image, the ImageInfo plug-in:

1. First looks for the Exif tag 40093.
2. If Exif tag 40093 is missing, then it looks for the IPTC tag 0x50.
3. If IPTC tag 0x50 is missing, then it looks for the Exif tag 315.
4. If none of the above are found, then there is no "Author" meta information

Note that GIFs do not store meta information except for it's width and height.

  Image types
JPEGs PNGs
IPTC Exif
Meta information
Author (2) 0x50 Byline

(1) 40093 Windows Author (via right-click "Properties" Author)

(3) 315 Artist

'tEXt' chunk's "Author"
Description+

(3) 0x78 Caption

(5) 0x74 Copyright notice

(6) 0x6E Credits AND 0x05 Source

(1) 40095 Windows Subject (via right-click "Properties" Subject)

(2) 40092 Windows Comments (via right-click "Properties" Comments)

(4) 270 ImageDescription

'tEXt' chunk's "Description"
Title (2) 0x69 Headline (1) 40091 Windows Title (via right-click "Properties" Title) 'tEXt' chunk's "Title"
Keywords (2) 0x19 Keywords (1) 40094 Windows Keywords (via right-click "Properties" Keywords) 'tEXt' chunk's "Comment"
Technical information
Width N.A.

(1) 256 ImageWidth

(2) Start of 1st frame width

(3) ExifImageWidth

'IHDR' chunk's width
Height N.A.

(1) 256 ImageHeight

(2) Start of 1st frame height

(3) ExifImageHeight

'IHDR' chunk's height
Make N.A. 271 Make 'tEXt' chunk's "Source"
Model N.A. 272 Model N.A.
ExposureProgram N.A. 34850 ExposureProgram N.A.
FNumber N.A. 33437 FNumber N.A.
Aperture N.A. 37378 ApertureValue N.A.
ShutterSpeed N.A. 37377 ShutterSpeed N.A.
ExposureTime N.A. 33434 ExposureTime N.A.
Flash N.A. 37385 Flash N.A.
DateTaken N.A. 36867 DateTimeOriginal 'tEXt' chunk's "Creation Time"

Q. What are the image file formats that ImageInfo plugin support?

Image files supported by ImageInfo are JPEGs, PNGs*, TIFFs and GIFs**. The table below lists the type of meta information that could be available in the different image file formats:

  Image file format Example
JPEG PNG TIFF GIF
Meta information
Author Y Y Y N John Smith
Description Y Y Y N Photo of a beach
Title Y Y Y N Pangkor Island, Malaysia
Keywords/Comments Y Y Y N beach, sun, sand, vacation
Technical information
Width Y Y Y Y 800
Height Y Y Y Y 600
Make Y Y Y N SONY
Model Y N Y N Cybershot
ExposureProgram Y N Y N manual control
FNumber Y N Y N F2.8
Aperture Y N Y N F2.8
ShutterSpeed Y N Y N 2.5
ExposureTime Y N Y N 1/100
Flash Y N Y N fired
DateTaken Y Y Y N 2005-11-07 13-48-01

Q. How do I edit meta information in my JPEGs?

There are a few ways you can edit meta information for image files:

From the file manager in Windows XP:
1. Right-click on the image and select "Properties".

Rick click properties

2. Enter the "Title:", "Subject:", "Author:", "Keywords:" and "Comments:".

Image info properties window

From third-party software:

You can choose to use third-party software to edit informative metadata in your JPEG files. Try searching on the internet for "edit metadata jpeg" or "edit Exif" (a list of them can be found at http://graphicssoft.about.com/od/exifsoftware/). I have chosen Exifer 2.1.5 from Friedmann Schmidt to demonstrate how to edit informative metadata using it.

1. Launch Exifer and navigate to the folder that the image files are.

2. Select the image and choose "EXIF/IPTC: Edit" from the menu.

Exifer edit link from menu screenshot

3. Edit "Author:", "Description:" and "Headline:" from the "Source/Description" tab.

Exifer edit window screenshot

4. Edit "Keywords:" from the "Keywords and categories" tab.

You can choose to fill in the other fields. However, only the "Author", "Description", "Headline" and "Keywords" fields are indexed (if it is configured to).

Q. How do I turn on image indexing in Zoom?

Before you can configure how to index image files, you need to download the ImageInfo plug-in and place it inside Zoom's "plugins" folder and restart Zoom.

Go to "Configure: Scan Options" and add ".jpg" extension into "Scan Extensions".
Add jpeg extension screenshot

Do likewise for all other image file formats that you wished to index. See "What type of image files are supported?" for a list of supported image file formats.

You have now turned on image indexing.

Q. How do I configure the type of image metadata being indexed?

Double-click on the .jpg file extension in "Zoom Indexer Configuration: Scan Options:" to bring up the "Image indexing options".

Image file configure screenshot

To allow users to search for informative metadata in your JPEG files, check "Retrieve internal meta information". To allow users to search for technical data in your image files, check "Retrieve technical data when available".

Image indexing options screenshot

(There is another way you can allow users to associate textual data with images by using the "Use description (.desc) files" method. For more information on that, see "How do I specify my own titles and descriptions for PDF and DOC files?")

Q. How do I index specific internal meta information?

Zoom can index images according to the "Indexing Options" from the "Zoom Indexer Configuration" window. The "Title of page" corresponds to "Title:" and "Headline:" respectively. The "Meta description" corresponds to "Comments:" and "Description:" respectively. The "Meta keywords" corresponds to "Keywords:" and the "Meta author" corresponds to "Author:"

What to index screenshot

When you link text to an image in a HTML file, the linked text (i.e. between the anchor tag <a href="myimage.jpg"> and </a>) and the name of the linked image file are also indexed. For example, the following HTML source:

<html>
...
Click <a href="picture1.gif">here</a> for a screenshot of Zoom Search Engine.<br>
...
</html>

is an example of an unrelated linked text as only the single generic word "here" is associated with the image. This is the result produced by the above HTML source code:
Click here for a screenshot of Zoom Search Engine.

A more meaningful way would be:

<html>
...
Click here for a <a href="zoom_screenshot.gif">screenshot of Zoom Search Engine</a>.<br>
...
</html>

This is the result produced by the above HTML source code:
Click here for a screenshot of Zoom Search Engine.

Notice how the linked text and the name of the linked image have been changed. This ensures better association of text to the image file and will yield more accurate image search results. Make sure to turn on "Link Text" indexing from the "Indexing Options" (See "How do I index specific internal meta information?").

When you link thumbnail to an image in a HTML file, the text alternative tag (i.e. the alt attribute in the <img> tag) and the name of the linked image file are also indexed. For example, the following HTML source:

<html>
...
<a href="screenshot.jpg"><img src="th_pangkor_beach.jpg" alt="a screenshot"></a></p>
...
</html>

is an example of a poorly related text alternative tag because it associates generically "a screenshot" with the linked image file with no hint of what the subject is or where the picture was taken. This is the result produced by the above HTML source code:
a screenshot

A more meaningful way would be:

<html>
...
<a href="pangkor_beach.jpg"><img src="th_pangkor_beach.jpg" alt="Screen shot of Pangkor beach"></a></p>
...
</html>

This is the result produced by the above HTML source code:
Screen shot of Pangkor beach

Notice how the text alternative tag (i.e. alt="Screen shot of Pangkor beach") and the name of the linked image file have been changed to allow for better correlation between text and the linked image file to ensure more accurate search results. Make sure to turn on "ALT Text" indexing from the "Indexing Options" (See "How do I index specific internal meta information?").

Q. I do not want Zoom to index certain images, for example images used as thumbnails on my web pages. What can I do?

Zoom Search Engine v5.0 and above has a feature that allows you to filter out images below a certain file size. Go to "Zoom Indexer Configuration: Scan Options" and double-click on the .jpg extension. You can tell Zoom to index image files only if they exceed the size criteria.

Filter size screenshot

The default value is 5 Kbytes. This is good enough to filter out most thumbnails.

Alternatively, go to "Zoom Indexer Configuration: Skip Options" and add the folder or file names that you do not want Zoom to Index. Zoom will not index any image file with path name matching any of these pattern.

Page and folder skip list screenshot

In the above example, I have configured Zoom to skip indexing any image file with path name that contains "/TIFFs/thumb", "/icon", "/thumb" or "/PNGs/thumb". Hence all images in "http://mysite.com/images/icon/" will not be indexed. Likewise, all images in "http://mysite.com/images/TIFFs/thumb" will not be indexed. In fact, I could have just specified "/thumb" in place of "/TIFFs/thumb", "/thumb" and "/PNGs/thumb". More examples below:

Will not be indexed:
http://mysite.com/images/icon.jpg
http://mysite.com/images/TIFFs/thumb.jpg
http://mysite.com/images/PNGs/thumb_cat.jpg
http://mysite.com/images/th_cat.jpg
http://mysite.com/images/example_th_cat.jpg

Will be indexed:
http://mysite.com/images/cat.jpg
http://mysite.com/images/TIFFs/cat.jpg
http://mysite.com/images/PNGs/cat.jpg
http://mysite.com/images/thick_book.jpg

Check "Skip files or directories that begin with an underscore" to skip files and directories that begin with an underscore '_'.

Firstly, you need to decide which technical information you want Zoom to filter. In this example, it is "Width" and "Make". The syntax to search for technical data is such:

Type:Value

Make sure to check "Colon" from "Zoom Indexer Configuration: Indexing Options: Indexing words"
Search tips Indexing Options: Indexing Words screenshot
to allow colon to join 2 words.

To search for all images with width 800, use "Width:800". To search for all pictures taken with a Sony camera, use "Make:Sony". Hence, to combine the 2, you will use "Width:800 Make:Sony" as your search words.

Refer to "How do I perform advanced searches?" for more search tips.

(*) A tool to edit meta information for PNGs called TweakPNG can be found at Jason Summer's website. Visit the Portable Network Graphics homepage for more information on PNGs.

(**) Refer to http://www.w3.org/Graphics/GIF/spec-gif89a.txt for the specification on the Graphics Interchange Format (GIF).

(+) If none of the tags for the "Description" meta information is found, ImageInfo will simply create a string that says which file type it is and the image dimension, for example "JPEG file, size 256x256".

Return to the Zoom Search Engine Support page