This README contains info regarding the following two different datasets:

  1.   PurdueShapes5

  2.   PurdueShapes5MultiObject

The first is meant for working with object detection and localization part of
DLStudio.  And the second is for working with the semantic segmentation part.

================================================================================
================================================================================


                          The PurdueShapes5 Dataset
                          =========================

This dataset was designed for giving classroom demonstrations of convolutional
networks that can carry out classification and regression at the same time.  The
classification is for the shape category of the object in an image and the regression
for the bounding box that encloses the object.

For any questions related to this dataset, contact the author at

     kak@purdue.edu

with the string "PurdueShapes5" in your subject line.

The PurdueShapes5 that contains 32x32 images of binary shapes belonging to the
following five classes:

                       1.  rectangle
                       2.  triangle
                       3.  disk
                       4.  oval
                       5.  star

The dataset was generated by randomizing the sizes and the orientations of these five
patterns in 32x32 images.  Since the patterns are rotated with a very simple
non-interpolating transform, just the act of random rotations can introduce boundary
and even interior noise in the patterns.

Each 32x32 image is stored in the dataset as the following list:

                           [R, G, B, Bbox, Label]
where

      R     :   is a 1024 element list of the values for the red component
                of the color at all the pixels

      B     :   the same as above but for the green component of the color

      G     :   the same as above but for the blue component of the color

      Bbox  :   a list like [x1,y1,x2,y2] that defines the bounding box
                for the object in the image

      Label :   the shape of the object

I serialize the dataset with Python's pickle module and then compress it with the
gzip module.

The dataset files you will find in this directory are:

               PurdueShapes5-10000-train.gz
               PurdueShapes5-1000-test.gz
               PurdueShapes5-20-train.gz
               PurdueShapes5-20-test.gz

The number that follows the main name string "PurdueShapes5-" is for the number of
images in the dataset.

You will find the last two datasets, with 20 images each, useful for debugging your
logic for object detection and bounding-box regression.


================================================================================
================================================================================


                    The PurdueShapes5MultiObject Dataset
                    ====================================

This dataset was designed for giving classroom demonstrations of convolutional
networks that can carry out semantic segmentation of images. Since each image
in the dataset contains multiple objects, it is possible to give demonstrations
of how a neural network can segment out multiple objects simultaneously from
an image.

The very first thing to note is that the images in the dataset
PurdueShapes5MultiObjectDataset are of size 64x64.  Each image has a random number
(up to five) of the objects drawn from the following five shapes: rectangle,
triangle, disk, oval, and star.  Each shape is randomized with respect to all its
parameters, including those for its scale and location in the image.

Each image in the dataset is represented by two data objects, one a list and the
other a dictionary. The list data objects consists of the following items:

        [R, G, B, mask_array, mask_val_to_bbox_map]                          ## (A)
            
and the other data object is a dictionary that is set to:
            
                label_map = {'rectangle':50, 
                             'triangle' :100, 
                             'disk'     :150, 
                             'oval'     :200, 
                             'star'     :250}                                ## (B)
            
Note that that second data object for each image is the same, as shown above.

In the rest of this README block, I'll explain in greater detail the elements of the
list in line (A) above.

            
R,G,B:
------

Each of these is a 4096-element array whose elements store the corresponding color
values at each of the 4096 pixels in a 64x64 image.  That is, R is a list of 4096
integers, each between 0 and 255, for the value of the red component of the color at
each pixel. Similarly, for G and B.
            

mask_array:
----------

The fourth item in the list shown in line (A) above is for the mask which is a numpy
array of shape:
            
         (5, 64, 64)
        
It is initialized by the command:
            
      mask_array = np.zeros((5,64,64), dtype=np.uint8)
            
In essence, the mask_array consists of five planes, each of size 64x64.  Each plane
of the mask array represents an object type according to the following shape_index
            
      shape_index = (label_map[shape] - 50) // 50
            
where the label_map is as shown in line (B) above.  In other words, the shape_index
values for the different shapes are:
            
                     rectangle:  0
                      triangle:  1
                          disk:  2
                          oval:  3
                          star:  4
            
Therefore, the first layer (of index 0) of the mask is where the pixel values of 50
are stored at all those pixels that belong to the rectangle shapes.  Similarly, the
second mask layer (of index 1) is where the pixel values of 100 are stored at all
those pixel coordinates that belong to the triangle shapes in an image; and so on.
            
It is in the manner described above that we define five different masks for an image
in the dataset.  Each mask is for a different shape and the pixel values at the
nonzero pixels in each mask layer are keyed to the shapes also.
            
A reader is likely to wonder as to the need for this redundancy in the dataset
representation of the shapes in each image.  Such a reader is likely to ask: Why
can't we just use the binary values 1s and 0s in each mask layer where the
corresponding pixels are in the image?  Setting these mask values to 50, 100, etc.,
was done merely for convenience.  I went with the intuition that the learning needed
for multi-object segmentation would become easier if each shape was represented by a
different pixels value in the corresponding mask. So I went ahead incorporated that
in the dataset generation program itself.

The mask values for the shapes are not to be confused with the actual RGB values of
the pixels that belong to the shapes. The RGB values at the pixels in a shape are
randomly generated.  Yes, all the pixels in a shape instance in an image have the
same RGB values (but that value has nothing to do with the values given to the mask
pixels for that shape).
            
            
mask_val_to_bbox_map:
--------------------
                   
The fifth item in the list in line (A) above is a dictionary that tells us what
bounding-box rectangle to associate with each shape in the image.  To illustrate what
this dictionary looks like, assume that an image contains only one rectangle and only
one disk, the dictionary in this case will look like:
            
        mask values to bbox mappings:  {200: [], 
                                        250: [], 
                                        100: [], 
                                         50: [[56, 20, 63, 25]], 
                                        150: [[37, 41, 55, 59]]}
            
Should there happen to be two rectangles in the same image, the dictionary would then
be like:
            
        mask values to bbox mappings:  {200: [], 
                                        250: [], 
                                        100: [], 
                                         50: [[56, 20, 63, 25], [18, 16, 32, 36]], 
                                        150: [[37, 41, 55, 59]]}
    
Therefore, it is not a problem even if all the objects in an image are of the same
type.  Remember, the object that are selected for an image are shown randomly from
the different shapes.  By the way, an entry like '[56, 20, 63, 25]' for the bounding
box means that the upper-left corner of the BBox for the 'rectangle' shape is at
(56,20) and the lower-right corner of the same is at the pixel coordinates (63,25).
            
As far as the BBox quadruples are concerned, in the definition
            
       [min_x,min_y,max_x,max_y]
            
note that x is the horizontal coordinate, increasing to the right on your screen, and
y is the vertical coordinate increasing downwards.

That's the end of my explanation of how each image is represented in the dataset.

I serialize the dataset with Python's pickle module and then compress it with the
gzip module.

The dataset files you will find in this directory are:

               PurdueShapes5MultiObject-10000-train.gz
               PurdueShapes5MultiObject-1000-test.gz
               PurdueShapes5MultiObject-20-train.gz
               PurdueShapes5MultiObject-20-test.gz


The number that follows the main name string "PurdueShapes5MultiObject-" is for the
number of images in the dataset.

You will find the last two datasets, with 20 images each, useful for debugging your
logic for object detection and bounding-box regression.
