Skip to content

ahmedssabir/Textual-Visual-Semantic-Dataset-for-Text-Spotting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Textual Visual Semantic Dataset for Text Spotting

example

This dataset has been used in these papers:

Visual Re-ranking with Natural Language Understanding for Text Spotting paper code

Semantic Relatedness Based Re-ranker for Text Spotting paper code

Motivation

Text Spotting in the Wild is a Computer Vision (CV) task consisting of detecting and recognizing text appearing in images (e.g. signboards, traffic signals or brands in clothing or objects). This is an unsolved problem due to the complexity of the context where texts appear (uneven backgrounds, shading, occlusions, perspective distortions, etc.). Only a few CV approaches try to exploit the relation between text and its surrounding environment to better recognize text in the scene. In this work, we propose a visual context dataset for Text Spotting in the wild, where the publicly available dataset COCO-text Veit (veit et al., 2016) has been extended with information about the scene (such as objects and places appearing in the image) to enable researchers to include on semantic relations between texts and scene in their Text Spotting systems, and to offer a common framework for such approaches.

Highlights

This dataset is based on COCO-text, Please visit https://github.com/andreasveit/coco-text. COCO-text is based on Microsoft COCO Please visit http://mscoco.org/ for more information on COCO-dataset, including the image data, object annotatins and caption annotations.

1 - Extracting full images with bounding box (gt) from COCO-text

2 - Extracting the Bounding box and top-k objects (from object classifer)

  • Matlab 2018 - you only need to run it once
  • MatConvNet open source deep learning freamework
  • Download most recent Pre-trained SOTA object classifer or Resnet152 (this code)
  • Run Extract_BBox.m file 1 bounding box file 2 full image

Cropped text images

full image

Visual contexts dataset (object, places*)

  • word level
  • sentence level
  • Image_id, spotted word(gt), objects, places
  • Example: COCO_train2014_000000000081.jpg,airfracne, airliner, airfield
  • Learning the sim/distance between two objects/places can be useful to filter out duplicated cases and false posstive example. Load visual-pairs and visual-pairs models and run this precomputed model M = containers.Map(pairsobject,cosine_sim) M('airfield airliner'), M('crosswalk plaza'), etc
  • For you own dataset, run sim.py with glove (840 billion tokens) better similiary score or sim-fastText with fastText (600B billion tokens) with capability on handling out-of-vocabulary (OOV).
  • Also, is possible to visualize word vectors using the notebook visualization-embedding

*You can find the model Places365-CNNs

For testing (object1, object2, places)

Visual contexts 2 (image description, object, place)

  • word level
  • sentence level
  • Image_id, spotted word(gt/baseline), caption
  • Example: COCO_train2014_000000000081.jpg, airfracne,a large jetliner flying through the sky with a sky background ,airliner, airfield)

For testing (image description)

Object and text co-occurrence database

example3

  • word level

  • sentence level

  • spotted word(w), (c) places/object- co-occurrence information between text and objects

  • The conditional probability of object/text happen togaher in COCO-text as shown in the example above the sports channel (kt) with a racket object-text-co-occurrence-(P(w|c).csv

  • run counting_pairs.py to count the pairs (spotted text, object/place) happen together

  • To get P(w|c) of pairs happen together in COCO-text load M = containers.Map(Pairs,Pairs_prob) then run the pairs M(' kt racket'), M(' pay parking')..etc

Dictionary 300K

Feedback

🙋‍♂️ Suggestions and opinions of this dataset (both positive and negative) are greatly welcome 🙇‍♂️. Please contact the author by sending an email to asabir◎cs。upc。edu

Citation

Please use the following bibtex entry:

@inproceedings{sabir2020textual,
  title={Textual visual semantic dataset for text spotting},
  author={Sabir, Ahmed and Moreno-Noguer, Francesc and Padr{\'o}, Llu{\'\i}s},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  pages={542--543},
  year={2020}
}

About

Textual Visual Semantic Dataset for Text Spotting. CVPRW 2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages