CS440/ECE448 Spring 2021Assignment 3: Neural Nets and PyTorchDue date: Wednesday March 17th, 11:59pm |
Created by Justin Lizama, Kedan Li, and Tiantian Fang
Updated fall 2020 by Jatin Arora, Kedan Li, and Michal Shlapentokh-Rothman
Updated spring 2021 by Mahir Morshed and Yangge Li
The goal of this assignment is to employ neural networks, nonlinear and multi-layer extensions of the linear perceptron, to detect whether or not images contain animals.
In the first part, you will create an 1980s-style shallow neural network. In the second part, you will improve this network using more modern techniques such as changing the activation function, changing the network architecture, or changing other initialization details.
You will be using the PyTorch and NumPy libraries to implement these models. The PyTorch library will do most of the heavy lifting for you, but it is still up to you to implement the right high-level instructions to train the model.
The dataset consists of 10000 32x32 colored images (a subset of the CIFAR-10 dataset, provided by Alex Krizhevsky), split for you into 7500 training examples (of which 2999 are negative and 4501 are positive) and 2500 development examples.
The data set can be downloaded here: (gzip) or (zip). When you uncompress this you'll find a binary object that our reader code will unpack for you.
To make things more precise, in lecture you learned of a function \( f_{w}(x) = \sum_{i=1}^n w_i x_i + b\). In this assignment, given weight matrices \(W_1,W_2\) with \(W_1 \in \mathbb{R}^{h \times d}\), \(W_2 \in \mathbb{R}^{h \times 2}\) and bias vectors \(b_1 \in \mathbb{R}^{h}\) and \(b_2 \in \mathbb{R}^{2}\), you will learn a function \( F_{W} \) defined as \[ F_{W} (x) = W_2\sigma(W_1 x + b_1) + b_2 \] where \(\sigma\) is your activation function. In part 1, you should use either of the sigmoid or ReLU activation functions. You will use 32 hidden units (\(h=32\)) and 3072 input units, one for each channel of each pixel in an image (\(d=(32)^2(3) = 3072\)).
Notice that because PyTorch's CrossEntropyLoss incorporates a sigmoid function, you do not need to explicitly include an activation function in the last layer of your network.
We have provided (tar/zip) all the code to get you started on your MP, which means you will only have to implement the PyTorch neural network model.
__init__()
is where you will need to construct the network architecture.
There are multiple ways to do this.
Linear
and Sequential
objects.
Keep in mind that Linear
uses a Kaiming He uniform initialization to initialize the weight
matrices and sets the bias terms to all zeros.Tensor
s. This approach is more hands on and will allow you
to choose
your own initialization. For this assignment, however, Kaiming He uniform initialization should
suffice and should be a good choice.get_parameters()
should do what its name suggests--namely, return a list of parameters used in the model.
(This and set_parameters()
will only be tested with respect to part 1, but you may find implementing and using these helpful for part 2.)
set_parameters()
should do what its name suggests--namely, set the parameters of the model based on those input to this method.
For consistency's sake, the order of the parameters should be the same as those returned in get_parameters()
.
(This and get_parameters()
will only be tested with respect to part 1, but you may find implementing and using these helpful for part 2.)
forward()
should perform a forward pass through your network. This means
it should explicitly evaluate \(F_{W}(x)\) . This can be done by simply calling your Sequential
object defined in __init__()
or (if you opted to define tensors explicitly) by multiplying through the weight
matrices with your data.
step()
should perform one iteration of training. This means it should
perform one gradient update through one batch of training data (not the entire set of training
data). You can do this by either calling loss_fn(yhat,y).backward()
then updating the weights directly yourself, or you can use an optimizer object that you
may have initialized in __init__()
to help you update the network. Be sure to call zero_grad()
on your optimizer in order to clear the gradient buffer. When you return the loss_value from this
function, make sure
you return loss_value.item()
(which works if it is just a single number)
or loss_value.detach().cpu().numpy()
(which separates the loss value from the computations that led up to it, moves it to the CPU—important if you decide to work locally on a GPU, bearing in mind that Gradescope won't be configured with a GPU—and then converts it to a NumPy array). This allows proper garbage collection to take place (lest your
program possibly
exceed the memory limits fixed on Gradescope).
fit()
takes as input the training data, training labels, development set, and the
maximum number of
iterations. The training data provided is the output from reader.py.
The training labels is a Tensor
consisting of labels corresponding to each image in the
training data.
The development set is the Tensor
of images that you are going to test your implementation on.
The maximium number of iterations is the number you specified with --max_iter (it is 500 by
default).
fit()
outputs the predicted labels. It should construct a NeuralNet
object,
and iteratively call the neural net's step()
to train the network. This should be
done by feeding in batches of data determined by batch size. You will use a batch size of 100 for
this assignment. max_iter is the number of batches (not the number of epochs!) in your training
process.
The only files you will need to modify are neuralnet_part1.py and neuralnet_part2.py.
To learn more about how to run the MP, run python3 mp3.py -h in your terminal.
You should definitely use the PyTorch documentation, linked multiple times on this page, to help you with implementation details. You can also use this PyTorch Tutorial as a reference to help you with your implementation. There are also other guides out there such as this one.
This MP will be submitted via Gradescope; please upload neuralnet_part1.py (for part 1) and neuralnet_part2.py (for part 2).
For an extra 10% worth of the points on this MP, your task will be to pick any two superclasses from the CIFAR-100 dataset (described in the same place as CIFAR-10) and rework your neural net from part 2, if necessary, to distinguish between those two superclasses. A superclass contains 2500 training images and 500 testing images, so between two superclasses you will be working with 3/5 the amount of data in total (6000 total images here versus 10000 total in the main MP).
You can download the CIFAR-100 data here and extract it to the same place where you've placed the data for the main MP. A custom reader for it is provided here; to use it with the CIFAR-100 data, you should rename this to reader.py and replace the existing file of that name in your working directory.
To set up your code for the extra credit, you must do the following:
The points for the extra credit are distributed as follows: