This paper presents an abductive formalism for interpreting images of multi-object scenes in terms of objects expected in the scenes. The inferencing scheme provides a mechanism for integrating relational feature-based evidence with the information about the local region properties like color, texture, etc. for making correct recognition decisions. The reasoning scheme is robust enough to tackle the problem of occlusion. A formal definition of the class of plausible interpretations of an image is provided, and competing interpretations are characterized. A problem solving scheme for generation of plausible interpretations has been presented. The concept of the best interpretation of the image has been introduced, and an algorithm has been designed that is guaranteed to provide the best interpretation among a set of plausible conditions.