CHANGE LOGINTRODUCTION EXTENDING AUTOGRAD SKIP CONNECTIONS OBJECT DETECTION AND LOCALIZATION NOISY OBJECT DETECTION AND LOCALIZATION SEMANTIC SEGMENTATION TEXT CLASSIFICATION DATA MODELING WITH ADVERSARIAL LEARNINGINSTALLATIONUSAGECONSTRUCTOR PARAMETERSPUBLIC METHODSINNER CLASSES OF THE MODULECO-CLASSES OF THE MODULEExamples DIRECTORYExamplesAdversarialNetworks DIRECTORYTHE DATASETS INCLUDED FOR THE MAIN DLStudio MODULE OBJECT DETECTION AND LOCALIZATION OBJECT DETECTION AND LOCALIZATION SEMANTIC SEGMENTATION TEXT CLASSIFICATION FOR THE ADVERSARIAL NETWORKS CLASSBUGSACKNOWLEDGMENTSABOUT THE AUTHORCOPYRIGHT
CHANGE LOG
Version 2.0.4:
This version mostly changes the HTML formatting of this documentation
page. The code has not changed.
Version 2.0.3:
I have been experimenting with how to best incorporate adversarial
learning in the DLStudio module. That's what accounts for the jump from
the previous public release version 1.1.4 to new version 2.0.3. The
latest version comes with a separate class named, AdversarialNetworks,
for experimenting with different types of such networks for learning
data models through adversarial learning and generating instances from
the learned models. The AdversarialNetworks class includes two
Discriminator-Generator (DG) pairs and one Critic-Generator (CG)
pair. Of the two DG pairs, the first is based on the logic of DCGAN,
and the second a small modification of the first. The CG pair is based
on the logic of Wasserstein GAN. This version of the module also comes
with a new examples directory, ExamplesAdversarialNetworks, that
contains example scripts that show how you can call the different DG
and CG pairs in the AdversarialNetworks class. Also included is a new
dataset I have created, PurdueShapes5GAN-20000, that contains 20,000
images of size 64x64 for experimenting with the GANs in this module.
Version 1.1.4:
This version has a new design for the text classification class
TEXTnetOrder2. This has entailed new scripts for training and testing
when using the new version of that class. Also includes a fix for a bug
discovered in Version 1.1.3
Version 1.1.3:
The only change made in this version is to the class GRUnet that is
used for text classification. In the new version, the final output
of this network is based on the LogSoftmax activation.
Version 1.1.2:
This version adds code to the module for experimenting with recurrent
neural networks (RNN) for classifying variable-length text input. With
an RNN, a variable-length text input can be characterized with a hidden
state vector of a fixed size. The text processing capabilities of the
module allow you to compare the results that you may obtain with and
without using a GRU. For such experiments, this version also comes with
a text dataset based on an old archive of product reviews made
available by Amazon.
Version 1.1.1:
This version fixes the buggy behavior of the module when using the
'depth' parameter to change the size of a network.
Version 1.1.0:
The main reason for this version was my observation that when the
training data is intentionally corrupted with a high level of noise, it
is possible for the output of regression to be a NaN (Not a Number).
In my testing at noise levels of 20%, 50%, and 80%, while you do not
see this problem when the noise level is 20%, it definitely becomes a
problem when the noise level is at 50%. To deal with this issue, this
version includes the test 'torch.isnan()' in the training and testing
code for object detection. This version of the module also provides
additional datasets with noise corrupted images with different levels
of noise. However, since the total size of the datasets now exceeds
the file-size limit at 'https://pypi.org', you'll need to download them
separately from the link provided in the main documentation page.
Version 1.0.9:
With this version, you can now use DLStudio for experiments in semantic
segmentation of images. The code added to the module is in a new inner
class that, as you might guess, is named SemanticSegmentation. The
workhorse of this inner class is a new implementation of the famous
Unet that I have named mUnet --- the prefix "m" stands for "multi" for
the ability of the network to segment out multiple objects
simultaneously. This version of DLStudio also comes with a new
dataset, PurdueShapes5MultiObject, for experimenting with mUnet. Each
image in this dataset contains a random number of selections from five
different shapes --- rectangle, triangle, disk, oval, and star --- that
are randomly scaled, oriented, and located in each image.
Version 1.0.7:
The main reason for creating this version of DLStudio is to be able to
use the module for illustrating how to simultaneously carry out
classification and regression (C&R) with the same convolutional
network. The specific C&R problem that is solved in this version is
the problem of object detection and localization. You want a CNN to
categorize the object in an image and, at the same time, estimate the
bounding-box for the detected object. Estimating the bounding-box is
referred to as regression. All of the code related to object detection
and localization is in the inner class DetectAndLocalize of the main
module file. Training a CNN to solve the detection and localization
problem requires a dataset that, in addition to the class labels for
the objects, also provides bounding-box annotations for the objects.
Towards that end, this version also comes with a new dataset called
PurdueShapes5. Another new inner class, CustomDataLoading, that is
also included in Version 1.0.7 has the dataloader for the PurdueShapes5
dataset.
Version 1.0.6:
This version has the bugfix for a bug in SkipBlock that was spotted by
a student as I was demonstrating in class the concepts related to the
use of skip connections in deep neural networks.
Version 1.0.5:
This version includes an inner class, SkipConnections, for
experimenting with skip connections to improve the performance of a
deep network. The Examples subdirectory of the distribution includes a
script, playing_with_skip_connections.py, that demonstrates how you can
experiment with SkipConnections. The network class used by
SkipConnections is named BMEnet with an easy-to-use interface for
experimenting with networks of arbitrary depth.
Version 1.0.4:
I have added one more inner class, AutogradCustomization, to the module
that illustrates how to extend Autograd if you want to endow it with
additional functionality. And, most importantly, this version fixes an
important bug that caused wrong information to be written out to the
disk when you tried to save the learned model at the end of a training
session. I have also cleaned up the comment blocks in the
implementation code.
Version 1.0.3:
This is the first public release version of this module.
INTRODUCTION
Every design activity involves mixing and matching things and doing so
repeatedly until you have achieved the desired results. The same thing
is true of modern deep learning networks. When you are working with a
new data domain, it is likely that you would want to experiment with
different network layouts that you may have dreamed of yourself or that
you may have seen somewhere in a publication or at some web site.
The goal of this module is to make it easier to engage in this process.
The idea is that you would drop in the module a new network and you
would be able to see right away the results you would get with the new
network.
This module also allows you to specify a network with a configuration
string. The module parses the string and creates the network. In
upcoming revisions of this module, I am planning to add additional
features to this approach in order to make it more general and more
useful for production work.
EXTENDING AUTOGRAD
Version 1.0.4 of DLStudio incorporates a new inner class,
AutogradCustomization, for illustrating how you can write your own code
for customizing the behavior of PyTorch's Autograd module. Your
starting point for understanding the code in AutogradCustomization
should be the following script in the Examples directory of the distro:
extending_autograd.py
Extending Autograd requires that you define a new verb class --- as I
have with the class DoSillyWithTensor shown in the main module file ---
with definitions for two static methods, "forward()" and "backward()".
Note that an instance constructed from this class is callable.
SKIP CONNECTIONS
Starting with Version 1.0.6, you can now experiment with skip
connections in a CNN to see how a deep network with this feature might
yield improved classification results. Deep networks suffer from the
problem of vanishing gradients that degrades their performance.
Vanishing gradients means that the gradients of the loss calculated in
the early layers of a network become increasingly muted as the network
becomes deeper. An important mitigation strategy for addressing this
problem consists of creating a CNN using blocks with skip connections.
The code for using skip connections is in the inner class
SkipConnections of the module. And the network that allows you to
construct a CNN with skip connections is named BMEnet. As shown in the
script playing_with_skip_connections.py in the Examples directory of
the distribution, you can easily create a CNN with arbitrary depth just
by using the constructor option "depth" for BMEnet. The basic block of
the network constructed in this manner is called SkipBlock which, very
much like the BasicBlock in ResNet-18, has a couple of convolutional
layers whose output is combined with the input to the block.
Note that the value given to the "depth" constructor option for the
BMEnet class does NOT translate directly into the actual depth of the
CNN. [Again, see the script playing_with_skip_connections.py in the
Examples directory for how to use this option.] The value of "depth" is
translated into how many instances of SkipBlock to use for constructing
the CNN.
If you want to use DLStudio for learning how to create your own
versions of SkipBlock-like shortcuts in a CNN, your starting point
should be the following script in the Examples directory of the distro:
playing_with_skip_connections.py
This script illustrates how to use the inner class BMEnet of the module
for experimenting with skip connections in a CNN. As the script shows,
the constructor of the BMEnet class comes with two options:
skip_connections and depth. By turning the first on and off, you can
directly illustrate in a classroom setting the improvement you can get
with skip connections. And by giving an appropriate value to the
"depth" option, you can show results for networks of different depths.
OBJECT DETECTION AND LOCALIZATION
The code for how to solve the problem of object detection and
localization with a CNN is in the inner classes DetectAndLocalize and
CustomDataLoading. This code was developed for version 1.0.7 of the
module. In general, object detection and localization problems are
more challenging than pure classification problems because solving the
localization part requires regression for the coordinates of the
bounding box that localize the object. If at all possible, you would
want the same CNN to provide answers to both the classification and the
regression questions and do so at the same time. This calls for a CNN
to possess two different output layers, one for classification and the
other for regression. A deep network that does exactly that is
illustrated by the LOADnet classes that are defined in the inner class
DetectAndLocalize of the DLStudio module. [By the way, the acronym
"LOAD" in "LOADnet" stands for "LOcalization And Detection".] Although
you will find three versions of the LOADnet class inside
DetectAndLocalize, for now only pay attention to the LOADnet2 class
since that is the one I have worked with the most for creating the
1.0.7 distribution.
As you would expect, training a CNN for object detection and
localization requires a dataset that, in addition to the class labels
for the images, also provides bounding-box annotations for the objects
in the images. Out of my great admiration for the CIFAR-10 dataset as
an educational tool for solving classification problems, I have created
small-image-format training and testing datasets for illustrating the
code devoted to object detection and localization in this module. The
training dataset is named PurdueShapes5-10000-train.gz and it consists
of 10,000 images, with each image of size 32x32 containing one of five
possible shapes --- rectangle, triangle, disk, oval, and star. The
shape objects in the images are randomized with respect to size,
orientation, and color. The testing dataset is named
PurdueShapes5-1000-test.gz and it contains 1000 images generated by the
same randomization process as used for the training dataset. You will
find these datasets in the "data" subdirectory of the "Examples"
directory in the distribution.
Providing a new dataset for experiments with detection and localization
meant that I also needed to supply a custom dataloader for the dataset.
Toward that end, Version 1.0.7 also includes another inner class named
CustomDataLoading where you will my implementation of the custom
dataloader for the PurdueShapes5 dataset.
If you want to use DLStudio for learning how to write your own PyTorch
code for object detection and localization, your starting point should
be the following script in the Examples directory of the distro:
object_detection_and_localization.py
Execute the script and understand what functionality of the inner class
DetectAndLocalize it invokes for object detection and localization.
NOISY OBJECT DETECTION AND LOCALIZATION
When the training data is intentionally corrupted with a high level of
noise, it is possible for the output of regression to be a NaN (Not a
Number). Here is what I observed when I tested the LOADnet2 network at
noise levels of 20%, 50%, and 80%: At 20% noise, both the labeling and
the regression accuracies become worse compared to the noiseless case,
but they would still be usable depending on the application. For
example, with two epochs of training, the overall classification
accuracy decreases from 91% to 83% and the regression error increases
from under a pixel (on the average) to around 3 pixels. However, when
the level of noise is increased to 50%, the regression output is often
a NaN (Not a Number), as presented by 'numpy.nan' or 'torch.nan'. To
deal with this problem, Version 1.1.0 of the DLStudio module checks the
output of the bounding-box regression before drawing the rectangles on
the images.
If you wish to experiment with detection and localization in the
presence of noise, your starting point should be the script
noisy_object_detection_and_localization.py
in the Examples directory of the distribution. Note that you would
need to download the datasets for such experiments directly from the
link provided near the top of this documentation page.
SEMANTIC SEGMENTATION
The code for how to carry out semantic segmentation is in the inner
class that is appropriately named SemanticSegmentation. At its
simplest, the purpose of semantic segmentation is to assign correct
labels to the different objects in a scene, while localizing them at
the same time. At a more sophisticated level, a system that carries
out semantic segmentation should also output a symbolic expression that
reflects an understanding of the scene in the image that is based on
the objects found in the image and their spatial relationships with one
another. The code in the new inner class is based on only the simplest
possible definition of what is meant by semantic segmentation.
The convolutional network that carries out semantic segmentation
DLStudio is named mUnet, where the letter "m" is short for "multi",
which, in turn, stands for the fact that mUnet is capable of segmenting
out multiple object simultaneously from an image. The mUnet network is
based on the now famous Unet network that was first proposed by
Ronneberger, Fischer and Brox in the paper "U-Net: Convolutional
Networks for Biomedical Image Segmentation". Their UNET extracts
binary masks for the cell pixel blobs of interest in biomedical images.
The output of UNET can therefore be treated as a pixel-wise binary
classifier at each pixel position. The mUnet class, on the other hand,
is intended for segmenting out multiple objects simultaneously form an
image. [A weaker reason for "m" in the name of the class is that it
uses skip connections in multiple ways --- such connections are used
not only across the two arms of the "U", but also also along the arms.
The skip connections in the original Unet are only between the two arms
of the U.
mUnet works by assigning a separate channel in the output of the
network to each different object type. After the network is trained,
for a given input image, all you have to do is examine the different
channels of the output for the presence or the absence of the objects
corresponding to the channel index.
This version of DLStudio also comes with a new dataset,
PurdueShapes5MultiObject, for experimenting with mUnet. Each image
in this dataset contains a random number of selections from five
different shapes, with the shapes being randomly scaled, oriented, and
located in each image. The five different shapes are: rectangle,
triangle, disk, oval, and star.
Your starting point for learning how to use the mUnet network for
segmenting images should be the following script in the Examples
directory of the distro:
semantic_segmentation.py
Execute the script and understand how it uses the functionality packed
in the inner class SemanticSegmentation for segmenting out the objects
in an image.
TEXT CLASSIFICATION
Starting with Version 1.1.2, the module includes an inner class
TextClassification that allows you to do simple experiments with neural
networks with feedback (that are also called Recurrent Neural
Networks). With an RNN, textual data of arbitrary length can be
characterized with a hidden state vector of a fixed size. To
facilitate text based experiments, this module also comes with text
datasets derived from an old Amazon archive of product reviews.
Further information regarding the datasets is in the comment block
associated with the class SentimentAnalysisDataset. If you want to use
DLStudio for experimenting with text, your starting points should be
the following three scripts in the Examples directory of the
distribution:
text_classification_with_TEXTnet_no_gru.py
text_classification_with_TEXTnetOrder2_no_gru.py
text_classification_with_gru.py
The first of these is meant to be used with the TEXTnet network that
does not include any protection against the vanishing gradients problem
that a poorly designed RNN can suffer from. The second script
mentioned above is based on the TEXTnetOrder2 network and it includes
rudimentary protection, but not enough to suffice for any practical
application. The purpose of TEXTnetOrder2 is to serve as an
educational stepping stone to a GRU (Gated Recurrent Unit) network that
is used in the third script listed above.
DATA MODELING WITH ADVERSARIAL LEARNING
Starting with version 2.0.3, DLStudio includes a separate class named
AdversarialNetworks for experimenting with different adversarial
learning approaches for data modeling. Adversarial Learning consists
of simultaneously training a Generator and a Discriminator (or, a
Generator and a Critic) with the goal of getting the Generator to
produce from pure noise images that look like those in the training
dataset. When Generator-Discriminator pairs are used, the
Discriminator's job is to become an expert at recognizing the training
images so it can let us know should the generator produce an image that
does not look like what is in the training dataset. The output of the
Discriminator consists of the probability that the input to the
discriminator is like one of the training images.
On the other hand, when a Generator-Critic pair is used, the Critic's
job is to become adept at estimating the distance between the
distribution that corresponds to the training dataset and the
distribution that has been learned by the Generator so far. If the
distance between the distributions is differentiable with respect to
the weights in the networks, one could backprop the distance and update
the weights in an iterative training loop. This is roughly the idea of
the Wasserstein GAN that is incorporated as a Critic-Generator pair CG1
in the Adversarial Networks class.
The AdversarialNetworks class includes two kinds of adversarial
networks for data modeling: DCGAN and WGAN.
DCGAN is short for "Deep Convolutional Generative Adversarial Network",
owes its origins to the paper "Unsupervised Representation Learning
with Deep Convolutional Generative Adversarial Networks" by Radford et
al. DCGAN was the first fully convolutional network for GANs
(Generative Adversarial Network). CNN's typically have a
fully-connected layer (an instance of nn.Linear) at the topmost level.
For the topmost layer in the Generator network, DCGAN uses another
convolution layer that produces the final output image. And for the
topmost layer of the Discriminator, DCGAN flattens the output and feeds
that into a sigmoid function for producing scalar value. Additionally,
DCGAN also gets rid of max-pooling for downsampling and instead uses
convolutions with strides. Yet another feature of a DCGAN is the use
of batch normalization in all layers, except in the output layer of the
Generator and the input layer of the Discriminator. As the authors of
DCGAN stated, while, in general, batch normalization stabilizes
learning by normalizing the input to each layer to have zero mean and
unit variance, applying at the output resulted in sample oscillation
and model instability. I have also retained in the DCGAN code the
leaky ReLU activation recommended by the authors for the Discriminator.
The other adversarial learning framework incorporated in
AdversarialNetworks is based on WGAN, which stands for Wasserstein GAN.
This GAN was proposed in the paper "Wasserstein GAN" by Arjovsky,
Chintala, and Bottou. WGANs is based on estimating the Wasserstein
distance between the distribution that corresponds to the training
images and the distribution that has been learned so far by the
Generator. The authors of WGAN have shown that minimizing this
distance is the same as maximizing the expectations of a to-be-learned
1-Lipschitz function applied to the individual samples drawn from the
two distributions. The challenge then becomes how to enforce the
1-Lipschitz continuity on the function being learned during training.
The WGAN authors have proposed an ad hoc strategy that appears to work
--- at least on some datasets. The strategy consists of clipping the
parameters of the Critic Network, whose job is to learn the 1-Lipschitz
function, to a narrow band of values as an ad hoc attempt at achieving
the continuity propertiy of such functions.
If you wish to use the DLStudio module to learn about data modeling
with adversarial learning, your entry points should be the following
scripts in the ExamplesAdversarialNetworks directory of the distro:
1. dcgan_multiobj_DG1.py
2. dcgan_multiobj_smallmod_DG2.py
3. wgan_multiobj_CG1.py
The first script demonstrates the DCGAN logic on the PurdueShapes5GAN
dataset. In order to show the sensitivity of the basic DCGAN logic to
any variations in the network or the weight initializations, the second
script introduces a small change in the network. The third script is a
demonstration of using the Wasserstein distance for data modeling
through adversarial learning. The results produced by these scripts
(for the constructor options shown in the scripts) are included in a
subdirectory named RVLCloud_based_results.
INSTALLATION
The DLStudio class was packaged using setuptools. For
installation, execute the following command in the source directory
(this is the directory that contains the setup.py file after you have
downloaded and uncompressed the package):
sudo python setup.py install
and/or, for the case of Python3,
sudo python3 setup.py install
On Linux distributions, this will install the module file at a location
that looks like
/usr/local/lib/python2.7/dist-packages/
and, for the case of Python3, at a location that looks like
/usr/local/lib/python3.6/dist-packages/
If you do not have root access, you have the option of working directly
off the directory in which you downloaded the software by simply
placing the following statements at the top of your scripts that use
the DLStudio class:
import sys
sys.path.append( "pathname_to_DLStudio_directory" )
To uninstall the module, simply delete the source directory, locate
where the DLStudio module was installed with "locate
DLStudio" and delete those files. As mentioned above,
the full pathname to the installed version is likely to look like
/usr/local/lib/python2.7/dist-packages/DLStudio*
If you want to carry out a non-standard install of the
DLStudio module, look up the on-line information on
Disutils by pointing your browser to
http://docs.python.org/dist/dist.html
USAGE
If you want to specify a network with just a configuration string,
your usage of the module is going to look like:
from DLStudio import *
convo_layers_config = "1x[128,3,3,1]-MaxPool(2) 1x[16,5,5,1]-MaxPool(2)"
fc_layers_config = [-1,1024,10]
dls = DLStudio( dataroot = "/home/kak/ImageDatasets/CIFAR-10/",
image_size = [32,32],
convo_layers_config = convo_layers_config,
fc_layers_config = fc_layers_config,
path_saved_model = "./saved_model",
momentum = 0.9,
learning_rate = 1e-3,
epochs = 2,
batch_size = 4,
classes = ('plane','car','bird','cat','deer',
'dog','frog','horse','ship','truck'),
use_gpu = True,
debug_train = 0,
debug_test = 1,
)
configs_for_all_convo_layers = dls.parse_config_string_for_convo_layers()
convo_layers = dls.build_convo_layers2( configs_for_all_convo_layers )
fc_layers = dls.build_fc_layers()
model = dls.Net(convo_layers, fc_layers)
dls.show_network_summary(model)
dls.load_cifar_10_dataset()
dls.run_code_for_training(model)
dls.run_code_for_testing(model)
or, if you would rather experiment with a drop-in network, your usage
of the module is going to look something like:
dls = DLStudio( dataroot = "/home/kak/ImageDatasets/CIFAR-10/",
image_size = [32,32],
path_saved_model = "./saved_model",
momentum = 0.9,
learning_rate = 1e-3,
epochs = 2,
batch_size = 4,
classes = ('plane','car','bird','cat','deer',
'dog','frog','horse','ship','truck'),
use_gpu = True,
debug_train = 0,
debug_test = 1,
)
exp_seq = DLStudio.ExperimentsWithSequential( dl_studio = dls ) ## for your drop-in network
exp_seq.load_cifar_10_dataset_with_augmentation()
model = exp_seq.Net()
dls.show_network_summary(model)
exp_seq.run_code_for_training(model)
exp_seq.run_code_for_testing(model)
This assumes that you copy-and-pasted the network you want to
experiment with in a class like ExperimentsWithSequential that is
included in the module.
CONSTRUCTOR PARAMETERS
batch_size: Carries the usual meaning in the neural network context.
classes: A list of the symbolic names for the classes.
convo_layers_config: This parameter allows you to specify a convolutional network
with a configuration string. Must be formatted as explained in the
comment block associated with the method
"parse_config_string_for_convo_layers()"
dataroot: This points to where your dataset is located.
debug_test: Setting it allow you to see images being used and their predicted
class labels every 2000 batch-based iterations of testing.
debug_train: Does the same thing during training that debug_test does during
testing.
epochs: Specifies the number of epochs to be used for training the network.
fc_layers_config: This parameter allows you to specify the final
fully-connected portion of the network with just a list of
the number of nodes in each layer of this portion. The
first entry in this list must be the number '-1', which
stands for the fact that the number of nodes in the first
layer will be determined by the final activation volume of
the convolutional portion of the network.
image_size: The heightxwidth size of the images in your dataset.
learning_rate: Again carries the usual meaning.
momentum: Carries the usual meaning and needed by the optimizer.
path_saved_model: The path to where you want the trained model to be
saved in your disk so that it can be retrieved later
for inference.
use_gpu: You must set it to True if you want the GPU to be used for training.
PUBLIC METHODS
(1) build_convo_layers()
This method creates the convolutional layers from the parameters
in the configuration string that was supplied through the
constructor option 'convo_layers_config'. The output produced by
the call to 'parse_config_string_for_convo_layers()' is supplied
as the argument to build_convo_layers().
(2) build_fc_layers()
From the list of ints supplied through the constructor option
'fc_layers_config', this method constructs the fully-connected
portion of the overall network.
(3) check_a_sampling_of_images()
Displays the first batch_size number of images in your dataset.
(4) display_tensor_as_image()
This method will display any tensor of shape (3,H,W), (1,H,W), or
just (H,W) as an image. If any further data normalizations is
needed for constructing a displayable image, the method takes care
of that. It has two input parameters: one for the tensor you want
displayed as an image and the other for a title for the image
display. The latter parameter is default initialized to an empty
string.
(5) load_cifar_10_dataset()
This is just a convenience method that calls on Torchvision's
functionality for creating a data loader.
(6) load_cifar_10_dataset_with_augmentation()
This convenience method also creates a data loader but it also
includes the syntax for data augmentation.
(7) parse_config_string_for_convo_layers()
As mentioned in the Introduction, DLStudio module allows you to
specify a convolutional network with a string provided the string
obeys the formatting convention described in the comment block of
this method. This method is for parsing such a string. The string
itself is presented to the module through the constructor option
'convo_layers_config'.
(8) run_code_for_testing()
This is the method runs the trained model on the test data. Its
output is a confusion matrix for the classes and the overall
accuracy for each class. The method has one input parameter which
is set to the network to be tested. This learnable parameters in
the network are initialized with the disk-stored version of the
trained model.
(9) run_code_for_training()
This is the method that does all the training work. If a GPU was
detected at the time an instance of the module was created, this
method takes care of making the appropriate calls in order to
transfer the tensors involved into the GPU memory.
(10) save_model()
Writes the model out to the disk at the location specified by the
constructor option 'path_saved_model'. Has one input parameter
for the model that needs to be written out.
(11) show_network_summary()
Displays a print representation of your network and calls on the
torchsummary module to print out the shape of the tensor at the
output of each layer in the network. The method has one input
parameter which is set to the network whose summary you want to
see.
INNER CLASSES OF THE MODULE
By "inner classes" I mean the classes that are defined within the class
file DLStudio.py in the DLStudio directory of the distribution. The
module also include what I have referred to as the Co-Classes in the
next section. A Co-Class resides at the same level of abstraction as
the main DLStudio class defined in the DLStudio.py file.
The purpose of the following two inner classes is to demonstrate how
you can create a custom class for your own network and test it within
the framework provided by the DLStudio module.
(1) class ExperimentsWithSequential
This class is my demonstration of experimenting with a network
that I found on GitHub. I copy-and-pasted it in this class to
test its capabilities. How to call on such a custom class is
shown by the following script in the Examples directory:
playing_with_sequential.py
(2) class ExperimentsWithCIFAR
This is very similar to the previous inner class, but uses a
common example of a network for experimenting with the CIFAR-10
dataset. Consisting of 32x32 images, this is a great dataset for
creating classroom demonstrations of convolutional networks.
As to how you should use this class is shown in the following
script
playing_with_cifar10.py
in the Examples directory of the distribution.
(3) class AutogradCustomization
The purpose of this class is to illustrate how to extend Autograd
with additional functionality. What's shown is an implementation of
the recommended approach at the following documentation page:
https://pytorch.org/docs/stable/notes/extending.html
(4) class SkipConnections
This class is for investigating the power of skip connections in
deep networks. Skip connections are used to mitigate a serious
problem associated with deep networks --- the problem of vanishing
gradients. It has been argued theoretically and demonstrated
empirically that as the depth of a neural network increases, the
gradients of the loss become more and more muted for the early
layers in the network.
(5) class DetectAndLocalize
The code in this inner class is for demonstrating how the same
convolutional network can simultaneously the twin problems of
object detection and localization. Note that, unlike the previous
four inner classes, class DetectAndLocalize comes with its own
implementations for the training and testing methods. The main
reason for that is that the training for detection and
localization must use two different loss functions simultaneously,
one for classification of the objects and the other for
regression. The function for testing is also a bit more involved
since it must now compute two kinds of errors, the classification
error and the regression error on the unseen data. Although you
will find a couple of different choices for the training and
testing functions for detection and localization inside
DetectAndLocalize, the ones I have worked with the most are those
that are used in the following two scripts in the Examples
directory:
run_code_for_training_with_CrossEntropy_and_MSE_Losses()
run_code_for_testing_detection_and_localization()
(6) class CustomDataLoading
This is a testbed for experimenting with a completely grounds-up
attempt at designing a custom data loader. Ordinarily, if the
basic format of how the dataset is stored is similar to one of the
datasets that Torchvision knows about, you can go ahead and use
that for your own dataset. At worst, you may need to carry out
some light customizations depending on the number of classes
involved, etc. However, if the underlying dataset is stored in a
manner that does not look like anything in Torchvision, you have
no choice but to supply yourself all of the data loading
infrastructure. That is what this inner class of the DLStudio
module is all about.
(7) class SemanticSegmentation
This inner class is for working with the mUnet convolutional
network for semantic segmentation of images. This network allows
you to segment out multiple objects simultaneously from an image.
Each object type is assigned a different channel in the output of
the network. So, for segmenting out the objects of a specified
type in a given input image, all you have to do is examine the
corresponding channel in the output.
(8) class TextClassification
The purpose of this inner class is to be able to use the DLStudio
module for simple experiments in text classification. Consider,
for example, the problem of automatic classification of
variable-length user feedback: you want to create a neural network
that can label an uploaded product review of arbitrary length as
positive or negative. One way to solve this problem is with a
Recurrent Neural Network in which you use a hidden state for
characterizing a variable-length product review with a
fixed-length state vector.
CO-CLASSES OF THE MODULE
As I stated at the beginning of the previous section, a Co-Class
resides at the same level of abstraction as the main DLStudio class
defined in the DLStudio.py file.
As of Version 2.0.3, the module contains only one co-class,
AdversarialNetworks, that is defined in the directory of the same name
in the distribution.
As I mentioned in the Introduction, the purpose of the
AdversarialNetworks class is to demonstrate probabilistic data modeling
using Generative Adversarial Networks (GAN). GANs use
Discriminator-Generator or Discriminator-Critic pairs to learn
probabilistic data models that can subsequently be used to create new
image instances that look surprising similar to those in the training
dataset. At the moment, you will find the following three such pairs
inside the AdversarialNetworks class:
1. Discriminator-Generator DG1 --- implements the DCGAN logic
2. Discriminator-Generator DG2 --- a slight modification of the previous
3. Critic-Generator CG1 --- implements the Wasserstein GAN logic
In the ExamplesAdversarialNetworks directory of the distro you will see
the following scripts that demonstrate adversarial learning as
incorporated in the above networks:
1. dcgan_multiobj_DG1.py --- demonstrates the DCGAN DG1
2. dcgan_multiobj_smallmod_DG2.py --- demonstrates the DCGAN DG2
3. wgan_multiobj_CG1.py --- demonstrates the Wasserstein GAN CG1
All of these scripts use the training dataset PurdueShapes5GAN that
consists of 20,000 images containing randomly shaped, randomply
colored, and randomply positioned objects in 64x64 arrays. The dataset
comes in the form of a gzipped archive named
"datasets_for_AdversarialNetworks.tar.gz" that is provided under the
link "Download the image dataset for AdversarialNetworks" at the top of
the HTML version of this doc page. See the README in the
ExamplesAdversarialNetworks directory for how to unpack the archive.
Examples DIRECTORY
The Examples subdirectory in the distribution contains the following
three scripts:
(1) playing_with_reconfig.py
Shows how you can specify a convolution network with a
configuration string. The DLStudio module parses the string
constructs the network.
(2) playing_with_sequential.py
Shows you how you can call on a custom inner class of the
'DLStudio' module that is meant to experiment with your own
network. The name of the inner class in this example script is
ExperimentsWithSequential
(3) playing_with_cifar10.py
This is very similar to the previous example script but is based
on the inner class ExperimentsWithCIFAR which uses more common
examples of networks for playing with the CIFAR-10 dataset.
(4) extending_autograd.py
This provides a demonstration example of the recommended approach
for giving additional functionality to Autograd --- as mentioned
in the commented made above about the inner class
AutogradCustomization.
(5) playing_with_skip_connections.py
This script illustrates how to use the inner class BMEnet of the
module for experimenting with skip connections in a CNN. As the
script shows, the constructor of the BMEnet class comes with two
options: skip_connections and depth. By turning the first on and
off, you can directly illustrate in a classroom setting the
improvement you can get with skip connections. And by giving an
appropriate value to the "depth" option, you can show results for
networks of different depths.
(6) custom_data_loading.py
This script shows how to use the custom dataloader in the inner
class CustomDataLoading of the DLStudio module. That custom
dataloader is meant specifically for the PurdueShapes5 dataset
that is used in object detection and localization experiments in
DLStudio.
(7) object_detection_and_localization.py
This script shows how you can use the functionality provided by
the inner class DetectAndLocalize of the DLStudio module for
experimenting with object detection and localization. Detecting
and localizing (D&L) objects in images is a more difficult problem
than just classifying the objects. D&L requires that your CNN
make two different types of inferences simultaneously, one for
classification and the other for localization. For the
localization part, the CNN must carry out what is known as
regression. What that means is that the CNN must output the
numerical values for the bounding box that encloses the object
that was detected. Generating these two types of inferences
requires two different loss functions, one for classification and
the other for regression.
(8) noisy_object_detection_and_localization.py
This script in the Examples directory is exactly the same as the
one described above, the only difference is that it calls on the
noise-corrupted training and testing dataset files. I thought it
would be best to create a separate script for studying the effects
of noise, just to allow for the possibility that the noise-related
studies with DLStudio may evolve differently in the future.
(9) semantic_segmentation.py
This script should be your starting point if you wish to learn how
to use the mUnet neural network for semantic segmentation of
images. As mentioned elsewhere in this documentation page, mUnet
assigns an output channel to each different type of object that
you wish to segment out from an image. So, given a test image at
the input to the network, all you have to do is to examine each
channel at the output for segmenting out the objects that
correspond to that output channel.
(10) text_classification_with_TEXTnet_no_gru.py
This and the next two scripts should be your starting points if
you wish to use DLStudio for experimenting with neural networks
with feedback. The main purpose of this script, which is based on
the TEXTnet network, is to demonstrate that unless you do
something to address the vanishing gradient problem (which can
become particularly acute when using feedback in a neural
network), you are not likely to get usable results from such a
learning framework.
(11) text_classification_with_TEXTnetOrder2_no_gru.py
This text classification script is based on the TEXTnetOrder2
network and its purpose is to serve as a stepping stone to using a
full-blown GRU network in the next script.
(12) text_classification_with_gru.py
The goal of this script is the same as for the previous two
scripts --- neural learning for automatic classification of
product reviews. However, now we use a GRU (Gated Recurrent Unit)
to remediate the problems that would otherwise be caused by
vanishing gradients in the long chains of dependencies created by
feedback.
ExamplesAdversarialNetworks DIRECTORY
The ExamplesAdversarialNetworks directory of the distribution contains
the following scripts for demonstrating adversarial learning for data
modeling:
1. dcgan_multiobj_DG1.py
2. dcgan_multiobj_smallmod_DG2.py
3. wgan_multiobj_CG1.py
The first script demonstrates the DCGAN logic on the PurdueShapes5GAN
dataset. In order to show the sensitivity of the basic DCGAN logic to
any variations in the network or the weight initializations, the second
script introduces a small change in the network. The third script is a
demonstration of using the Wasserstein distance for data modeling
through adversarial learning. The PurdueShapes5GAN dataset consists of
64x64 images with randomly shaped, randomly positioned, and randomly
colored shapes.
The results produced by these scripts (for the constructor options
shown in the scripts) are included in a subdirectory named
RVLCloud_based_results. If you are just becoming familiar with the
AdversarialNetworks class of DLStudio, I'd urge you to run the script
with the constructor options as shown and to compare your results with
those that are in the RVLCloud_based_results directory.
THE DATASETS INCLUDED
[must be downloaded separately]
FOR THE MAIN DLStudio MODULE
Download the dataset archive 'datasets_for_DLStudio.tar.gz' through
the link "Download the image datasets for the main DLStudio Class"
provided at the top of this documentation page and store it in the
'Example' directory of the distribution. Subsequently, execute the
following command in the 'Examples' directory:
cd Examples
tar zxvf datasets_for_DLStudio.tar.gz
This command will create a 'data' subdirectory in the 'Examples'
directory and deposit the datasets mentioned below in that
subdirectory.
OBJECT DETECTION AND LOCALIZATION
Training a CNN for object detection and localization requires training
and testing datasets that come with bounding-box annotations. This
module comes with the PurdueShapes5 dataset for that purpose. I
created this small-image-format dataset out of my admiration for the
CIFAR-10 dataset as an educational tool for demonstrating
classification networks in a classroom setting. You will find the
following dataset archive files in the "data" subdirectory of the
"Examples" directory of the distro:
(1) PurdueShapes5-10000-train.gz
PurdueShapes5-1000-test.gz
(2) PurdueShapes5-20-train.gz
PurdueShapes5-20-test.gz
The number that follows the main name string "PurdueShapes5-" is for
the number of images in the dataset. You will find the last two
datasets, with 20 images each, useful for debugging your logic for
object detection and bounding-box regression.
As to how the image data is stored in the archives, please see the main
comment block for the inner class CustomLoading in this file.
OBJECT DETECTION AND LOCALIZATION WITH NOISE-CORRUPTED IMAGES
In terms of how the image data is stored in the dataset files, this
dataset is no different from the PurdueShapes5 dataset described above.
The only difference is that we now add varying degrees of noise to the
images to make it more challenging for both classification and
regression.
The archive files you will find in the 'data' subdirectory of the
'Examples' directory for this dataset are:
(3) PurdueShapes5-10000-train-noise-20.gz
PurdueShapes5-1000-test-noise-20.gz
(4) PurdueShapes5-10000-train-noise-50.gz
PurdueShapes5-1000-test-noise-50.gz
(5) PurdueShapes5-10000-train-noise-80.gz
PurdueShapes5-1000-test-noise-80.gz
In the names of these six archive files, the numbers 20, 50, and 80
stand for the level of noise in the images. For example, 20 means 20%
noise. The percentage level indicates the fraction of the color value
range that is added as randomly generated noise to the images. The
first integer in the name of each archive carries the same meaning as
mentioned above for the regular PurdueShapes5 dataset: It stands for
the number of images in the dataset.
SEMANTIC SEGMENTATION
Showing interesting results with semantic segmentation requires images
that contains multiple objects of different types. A good semantic
segmenter would then allow for each object type to be segmented out
separately from an image. A network that can carry out such
segmentation needs training and testing datasets in which the images
come up with multiple objects of different types in them. Towards that
end, I have created the following dataset:
(6) PurdueShapes5MultiObject-10000-train.gz
PurdueShapes5MultiObject-1000-test.gz
(7) PurdueShapes5MultiObject-20-train.gz
PurdueShapes5MultiObject-20-test.gz
The number that follows the main name string
"PurdueShapes5MultiObject-" is for the number of images in the dataset.
You will find the last two datasets, with 20 images each, useful for
debugging your logic for semantic segmentation.
As to how the image data is stored in the archive files listed above,
please see the main comment block for the class
PurdueShapes5MultiObjectDataset
As explained there, in addition to the RGB values at the pixels that
are stored in the form of three separate lists called R, G, and B, the
shapes themselves are stored in the form an array of masks, each of
size 64x64, with each mask array representing a particular shape. For
illustration, the rectangle shape is represented by the first such
array. And so on.
TEXT CLASSIFICATION
My experiments tell me that, when using gated RNNs, the size of the
vocabulary can significantly impact the time it takes to train a neural
network for text modeling and classification. My goal was to provide
curated datasets extract from the Amazon user-feedback archive that
would lend themselves to experimentation on, say, your personal laptop
with a rudimentary GPU like the Quadro. Here are the new datasets you
can now download from the main documentation page for this module:
(8) sentiment_dataset_train_200.tar.gz vocab_size = 43,285
sentiment_dataset_test_200.tar.gz
(9) sentiment_dataset_train_40.tar.gz vocab_size = 17,001
sentiment_dataset_test_40.tar.gz
(10) sentiment_dataset_train_400.tar.gz vocab_size = 64,350
sentiment_dataset_test_400.tar.gz
As with the other datasets, the integer in the name of each dataset is
the number of reviews collected from the 'positive.reviews' and the
'negative.reviews' files for each product category. Therefore, the
dataset with 200 in its name has a total of 400 reviews for each
product category. Also provided are two datasets named
"sentiment_dataset_train_3.tar.gz" and sentiment_dataset_test_3.tar.gz"
just for the purpose of debugging your code.
The last dataset, the one with 400 in its name, was added in Version
1.1.3 of the module.
FOR THE ADVERSARIAL NETWORKS CLASS
Download the dataset archive
datasets_for_AdversarialNetworks.tar.gz
through the link "Download the image dataset for
AdversarialNetworks" provided at the top of the HTML version of
this doc page and store it in the 'ExamplesAdversarialNetworks'
directory of the distribution. Subsequently, execute the following
command in the directory 'ExamplesAdversarialNetworks':
tar zxvf datasets_for_AdversarialNetworks.tar.gz
This command will create a 'dataGAN' subdirectory and deposit the
following dataset archive in that subdirectory:
PurdueShapes5GAN-20000.tar.gz
Now execute the following in the "dataGAN" directory:
tar zxvf PurdueShapes5GAN-20000.tar.gz
With that, you should be able to execute the adversarial learning
based scripts in the 'ExamplesAdversarialNetworks' directory.
BUGS
Please notify the author if you encounter any bugs. When sending
email, please place the string 'DLStudio' in the subject line to get
past the author's spam filter.
ACKNOWLEDGMENTS
Thanks to Praneet Singh and Noureldin Hendy for their comments related
to the buggy behavior of the module when using the 'depth' parameter to
change the size of a network. Thanks also go to Christina Eberhardt for
reminding me that I needed to change the value of the 'dataroot'
parameter in my Examples scripts prior to packaging a new distribution.
Their feedback led to Version 1.1.1 of this module. Regarding the
changes made in Version 1.1.4, one of them is a fix for the bug found
by Serdar Ozguc in Version 1.1.3. Thanks Serdar.
Version 2.0.3: I owe thanks to Ankit Manerikar for many wonderful
conversations related to the rapidly evolving area of generative
adversarial networks in deep learning. It is obviously important to
read research papers to become familiar with the goings-on in an area.
However, if you wish to also develop deep intuitions in those concepts,
nothing can beat having great conversations with a strong researcher
like Ankit. Ankit is finishing his Ph.D. in the Robot Vision Lab at
Purdue.
ABOUT THE AUTHOR
The author, Avinash Kak, is a professor of Electrical and Computer
Engineering at Purdue University. For all issues related to this
module, contact the author at kak@purdue.edu If you send email, please
place the string "DLStudio" in your subject line to get past the
author's spam filter.
COPYRIGHT
Python Software Foundation License
Copyright 2021 Avinash Kak
@endofdocs
Classes |
| | |
- builtins.object
-
- DLStudio
class DLStudio(builtins.object) |
| |
DLStudio(*args, **kwargs)
|
| |
Methods defined here:
- __init__(self, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- build_convo_layers(self, configs_for_all_convo_layers)
- build_fc_layers(self)
- check_a_sampling_of_images(self)
- Displays the first batch_size number of images in your dataset.
- display_tensor_as_image(self, tensor, title='')
- This method converts the argument tensor into a photo image that you can display
in your terminal screen. It can convert tensors of three different shapes
into images: (3,H,W), (1,H,W), and (H,W), where H, for height, stands for the
number of pixels in the vertical direction and W, for width, for the same
along the horizontal direction. When the first element of the shape is 3,
that means that the tensor represents a color image in which each pixel in
the (H,W) plane has three values for the three color channels. On the other
hand, when the first element is 1, that stands for a tensor that will be
shown as a grayscale image. And when the shape is just (H,W), that is
automatically taken to be for a grayscale image.
- imshow(self, img)
- called by display_tensor_as_image() for displaying the image
- load_cifar_10_dataset(self)
- We make sure that the transformation applied to the image end the images being normalized.
Consider this call to normalize: "Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))". The three
numbers in the first tuple affect the means in the three color channels and the three
numbers in the second tuple affect the standard deviations. In this case, we want the
image value in each channel to be changed to:
image_channel_val = (image_channel_val - mean) / std
So with mean and std both set 0.5 for all three channels, if the image tensor originally
was between 0 and 1.0, after this normalization, the tensor will be between -1.0 and +1.0.
If needed we can do inverse normalization by
image_channel_val = (image_channel_val * std) + mean
- load_cifar_10_dataset_with_augmentation(self)
- In general, we want to do data augmentation for training:
- parse_config_string_for_convo_layers(self)
- Each collection of 'n' otherwise identical layers in a convolutional network is
specified by a string that looks like:
"nx[a,b,c,d]-MaxPool(k)"
where
n = num of this type of convo layer
a = number of out_channels [in_channels determined by prev layer]
b,c = kernel for this layer is of size (b,c) [b along height, c along width]
d = stride for convolutions
k = maxpooling over kxk patches with stride of k
Example:
"n1x[a1,b1,c1,d1]-MaxPool(k1) n2x[a2,b2,c2,d2]-MaxPool(k2)"
- plot_loss(self)
- run_code_for_testing(self, net)
- run_code_for_training(self, net)
- save_model(self, model)
- Save the trained model to a disk file
- show_network_summary(self, net)
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- AutogradCustomization = <class 'DLStudio.DLStudio.AutogradCustomization'>
- This class illustrates how you can add additional functionality of Autograd by
following the instructions posted at
https://pytorch.org/docs/stable/notes/extending.html
- CustomDataLoading = <class 'DLStudio.DLStudio.CustomDataLoading'>
- This is a testbed for experimenting with a completely grounds-up attempt at
designing a custom data loader. Ordinarily, if the basic format of how the
dataset is stored is similar to one of the datasets that the Torchvision
module knows about, you can go ahead and use that for your own dataset. At
worst, you may need to carry out some light customizations depending on the
number of classes involved, etc.
However, if the underlying dataset is stored in a manner that does not look
like anything in Torchvision, you have no choice but to supply yourself all
of the data loading infrastructure. That is what this inner class of the
DLStudio module is all about.
The custom data loading exercise here is related to a dataset called
PurdueShapes5 that contains 32x32 images of binary shapes belonging to the
following five classes:
1. rectangle
2. triangle
3. disk
4. oval
5. star
The dataset was generated by randomizing the sizes and the orientations
of these five patterns. Since the patterns are rotated with a very simple
non-interpolating transform, just the act of random rotations can introduce
boundary and even interior noise in the patterns.
Each 32x32 image is stored in the dataset as the following list:
[R, G, B, Bbox, Label]
where
R : is a 1024 element list of the values for the red component
of the color at all the pixels
B : the same as above but for the green component of the color
G : the same as above but for the blue component of the color
Bbox : a list like [x1,y1,x2,y2] that defines the bounding box
for the object in the image
Label : the shape of the object
I serialize the dataset with Python's pickle module and then compress it with
the gzip module.
You will find the following dataset directories in the "data" subdirectory
of Examples in the DLStudio distro:
PurdueShapes5-10000-train.gz
PurdueShapes5-1000-test.gz
PurdueShapes5-20-train.gz
PurdueShapes5-20-test.gz
The number that follows the main name string "PurdueShapes5-" is for the
number of images in the dataset.
You will find the last two datasets, with 20 images each, useful for debugging
your logic for object detection and bounding-box regression.
- DetectAndLocalize = <class 'DLStudio.DLStudio.DetectAndLocalize'>
- The purpose of this inner class is to focus on object detection in images --- as
opposed to image classification. Most people would say that object detection
is a more challenging problem than image classification because, in general,
the former also requires localization. The simplest interpretation of what
is meant by localization is that the code that carries out object detection
must also output a bounding-box rectangle for the object that was detected.
You will find in this inner class some examples of LOADnet classes meant
for solving the object detection and localization problem. The acronym
"LOAD" in "LOADnet" stands for
"LOcalization And Detection"
The different network examples included here are LOADnet1, LOADnet2, and
LOADnet3. For now, only pay attention to LOADnet2 since that's the class I
have worked with the most for the 1.0.7 distribution.
- ExperimentsWithCIFAR = <class 'DLStudio.DLStudio.ExperimentsWithCIFAR'>
- ExperimentsWithSequential = <class 'DLStudio.DLStudio.ExperimentsWithSequential'>
- Demonstrates how to use the torch.nn.Sequential container class
- Net = <class 'DLStudio.DLStudio.Net'>
- SemanticSegmentation = <class 'DLStudio.DLStudio.SemanticSegmentation'>
- The purpose of this inner class is to be able to use the DLStudio module for
experiments with semantic segmentation. At its simplest level, the
purpose of semantic segmentation is to assign correct labels to the
different objects in a scene, while localizing them at the same time. At
a more sophisticated level, a system that carries out semantic
segmentation should also output a symbolic expression based on the objects
found in the image and their spatial relationships with one another.
The workhorse of this inner class is the mUnet network that is based
on the UNET network that was first proposed by Ronneberger, Fischer and
Brox in the paper "U-Net: Convolutional Networks for Biomedical Image
Segmentation". Their Unet extracts binary masks for the cell pixel blobs
of interest in biomedical images. The output of their Unet can
therefore be treated as a pixel-wise binary classifier at each pixel
position. The mUnet class, on the other hand, is intended for
segmenting out multiple objects simultaneously form an image. [A weaker
reason for "Multi" in the name of the class is that it uses skip
connections not only across the two arms of the "U", but also also along
the arms. The skip connections in the original Unet are only between the
two arms of the U. In mUnet, each object type is assigned a separate
channel in the output of the network.
This version of DLStudio also comes with a new dataset,
PurdueShapes5MultiObject, for experimenting with mUnet. Each image in
this dataset contains a random number of selections from five different
shapes, with the shapes being randomly scaled, oriented, and located in
each image. The five different shapes are: rectangle, triangle, disk,
oval, and star.
- SkipConnections = <class 'DLStudio.DLStudio.SkipConnections'>
- This educational class is meant for illustrating the concepts related to the
use of skip connections in neural network. It is now well known that deep
networks are difficult to train because of the vanishing gradients problem.
What that means is that as the depth of network increases, the loss gradients
calculated for the early layers become more and more muted, which suppresses
the learning of the parameters in those layers. An important mitigation
strategy for addressing this problem consists of creating a CNN using blocks
with skip connections.
With the code shown in this inner class of the module, you can now experiment
with skip connections in a CNN to see how a deep network with this feature
might improve the classification results. As you will see in the code shown
below, the network that allows you to construct a CNN with skip connections
is named BMEnet. As shown in the script playing_with_skip_connections.py in
the Examples directory of the distribution, you can easily create a CNN with
arbitrary depth just by using the "depth" constructor option for the BMEnet
class. The basic block of the network constructed by BMEnet is called
SkipBlock which, very much like the BasicBlock in ResNet-18, has a couple of
convolutional layers whose output is combined with the input to the block.
Note that the value given to the "depth" constructor option for the
BMEnet class does NOT translate directly into the actual depth of the
CNN. [Again, see the script playing_with_skip_connections.py in the Examples
directory for how to use this option.] The value of "depth" is translated
into how many instances of SkipBlock to use for constructing the CNN.
- TextClassification = <class 'DLStudio.DLStudio.TextClassification'>
- The purpose of this inner class is to be able to use the DLStudio module for simple
experiments in text classification. Consider, for example, the problem of automatic
classification of variable-length user feedback: you want to create a neural network
that can label an uploaded product review of arbitrary length as positive or negative.
One way to solve this problem is with a recurrent neural network in which you use a
hidden state for characterizing a variable-length product review with a fixed-length
state vector. This inner class allows you to carry out such experiments.
| |
Data |
| |
__author__ = 'Avinash Kak (kak@purdue.edu)'
__copyright__ = '(C) 2021 Avinash Kak. Python Software Foundation.'
__date__ = '2021-January-16'
__url__ = 'https://engineering.purdue.edu/kak/distDT/DLStudio-2.0.4.html'
__version__ = '2.0.4' |
Author |
| |
Avinash Kak (kak@purdue.edu) |