FBK home > INFORMATION TECHNOLOGY > Technologies of Vision > semantic image labelling

Technologies of Vision: semantic image labelling

TextInScene

* goto TeVwhat is text in scene

Since text conveys semantic information, the reading of text in images plays an important role in the image content understanding. We can divide text's appearance in images (and video frames) into two macro categories: overlaid text and scene text. The former group includes text that is superimposed over the image, like timestamps, captions, titles. This text is deliberately present in the image.
On the contrary, scene text is inherently embedded within the scene, for example hotel or shop placards, road signs, street names, posters. Due to this natural presence, such text can manifest itself in a wide range of conditions, depending upon several factors related to the scene and the acquisition process. This fact, in general, makes its detection and reading a very challenging task. While OCR technology is fairly mature, reading characters from photographs is a hard problem to solve, starting from their detection and segmentation.

* goto TeVapplications

Reading text embedded in natural scenes plays an important role in several applications, such as indexing of multimedia archives, recognizing signs in driver assisted systems, providing scene information to visually impaired people, identifying vehicles by reading their license plates. With the explosion and widespread diffusion of low-priced digital cameras and mobile phones endowed with good quality cameras, text extraction from camera-captured scenes has gained a renewed attention in computer vision research.

* goto TeVour approach

The first step relies on an intensity normalization process which improves image details and the local contrast in shadowed regions. Intensity normalization is achieved by the computation of the divisive local contrast.
Two thresholds are determined by taking into account the shape of the histogram of the normalized image: These are used to compute two binary maps which should contain, respectively, positive and negative contrasting text, if present.
The connected components of these bitmaps are analyzed separately: Their shape features (area, elongations, convexity...) and the correspondent gradient in the input image are analysed by a cascade of attribute filters to mark likely non-text components as non-interesting. Few thresholds are read from a "prior knowledge" file related to the scenario (text on athletes' bibs, book covers, text in city,...)
In order to extract text lines, the survived components, eventually by splitting them, are recursively clustered according to proximity, alignment and size similarity, until a termination criterion is satisfied. Clusters which potentially contain a single text line are considered.
Once a cluster is accepted as a candidate text-line, all of the components inside this region, which were previously marked as non-interesting, are reconsidered for possible restoration before the text recognition phase .
The OCR is the last filter to reject non-text clusters.

* goto TeV demo

A step by step example of text localization, segmentation and reading.
Examples of text detection and segmentation in several images.

* goto TeVreferences

S. Messelodi, C.M. Modena
Scene Text Recognition and Tracking to Identify Athletes in Sport Videos
Multimedia Tools and Applications, Special Issue on Automated Information Extraction in Media Production (on-line 2011) [doi]

S. Messelodi, C.M. Modena
Automatic Identification and Skew Estimation of Text Lines in Real Scene Images
Pattern Recognition, Vol. 32, No. 5, pp. 789-808, May 1999 [abstract] [doi]

S. Messelodi, C.M. Modena
Context Driven Text Segmentation and Recognition
Pattern Recognition Letters, Vol. 17, No. 1, pp. 47-56, 1996 [abstract] [doi]


* goto TeVdatabases

We carried on our experiments on:
  • a labelled database of 1003 book covers acquired by a CCD camera, where text is possibly typed with different font, background and slope on the same cover [doi]
  • a database of 249 171 video frames of athletic events where text of interest is on the athletes' bibs.[doi]

* goto TeVcontact

Please contact
Carla Maria Modena   |  e-mail modena (at) fbk . eu
Stefano Messelodi  |  e-mail messelod (at) fbk . eu