Semantic classification of image content using LVQ

Document Type

Conference Proceeding

Publication Date

12-1-2004

Abstract

Gains in multimedia technology have led to a rapid growth in image data. Classification of image content has mostly been limited to classification based on caption text, simple image measures, or human classification. This presentation presents a new semantic image classification method. Images are first processed to extract low level color, texture, and rotation and scale invariant shape features are first extracted, and then a Linear Vector Quantization (LVQ) neural network is used to perform the semantic classification. The color feature used is based on the HSV (hue saturation value) colorspace histogram for the image, and uses the means and variances of each dimension of the colorspace (H, S, and V) resulting in a six element feature vector. Fractal dimension is used to provide a texture feature, while Fourier descriptors are used to provide a shape feature. An LVQ network has a first competitive layer and a second linear layer. The competitive layer learns to classify input vectors. The linear layer transforms the competitive layer's classes into target classifications defined by the user. The classes learned by the competitive layers are called subclasses and the classes of the linear layer are called target classes. For each training pattern the reference vector that is closest to it is determined. The corresponding output neuron is called the winner neuron. The weights of the connections to this neuron are then adapted. The direction of the adaptation depends on whether the class of the training pattern and the class assigned to the reference vector coincide or not. If they coincide, the reference vector is moved closer to the training pattern; otherwise it is moved farther away. This movement of the reference vector is controlled by a parameter called the learning rate. It is stated as a fraction of the distance to the training pattern how far the reference vector is moved. Usually the learning rate is decreased in the course of time, so that initial changes are larger than changes made in later epochs of the training process. Learning may be terminated when the positions of the reference vectors do not change This work focused on classification of images into the semantic classes "urban" and "nature/rural" for testing purposes. Two hundred color photos depicting city and landscape (roughly 50% for each) were used for training and another 200 for testing purposes. Precision and recall accuracy are used to measure performance of the algorithm. The method has been implemented in Matlab. A summary of results will be presented at the conference.

Publication Source (Journal or Book title)

IIE Annual Conference and Exhibition 2004

First Page

2229

This document is currently not available here.

Share

COinS