CS440/ECE448 Spring 2021

Assignment 3: Neural Nets and PyTorch

Due date: Wednesday March 17th, 11:59pm

image from Wikipedia

Created by Justin Lizama, Kedan Li, and Tiantian Fang

Updated fall 2020 by Jatin Arora, Kedan Li, and Michal Shlapentokh-Rothman

Updated spring 2021 by Mahir Morshed and Yangge Li

The goal of this assignment is to employ neural networks, nonlinear and multi-layer extensions of the linear perceptron, to detect whether or not images contain animals.

In the first part, you will create an 1980s-style shallow neural network. In the second part, you will improve this network using more modern techniques such as changing the activation function, changing the network architecture, or changing other initialization details.

You will be using the PyTorch and NumPy libraries to implement these models. The PyTorch library will do most of the heavy lifting for you, but it is still up to you to implement the right high-level instructions to train the model.



The dataset consists of 10000 32x32 colored images (a subset of the CIFAR-10 dataset, provided by Alex Krizhevsky), split for you into 7500 training examples (of which 2999 are negative and 4501 are positive) and 2500 development examples.

The data set can be downloaded here: (gzip) or (zip). When you uncompress this you'll find a binary object that our reader code will unpack for you.

Part 1: Classical Shallow Network

The basic neural network model consists of a sequence of hidden layers sandwiched by an input and output layer. Input is fed into it from the input layer and the data is passed through the hidden layers and out to the output layer. Induced by every neural network is a function \(F_{W}\) which is given by propagating the data through the layers.

To make things more precise, in lecture you learned of a function \( f_{w}(x) = \sum_{i=1}^n w_i x_i + b\). In this assignment, given weight matrices \(W_1,W_2\) with \(W_1 \in \mathbb{R}^{h \times d}\), \(W_2 \in \mathbb{R}^{h \times 2}\) and bias vectors \(b_1 \in \mathbb{R}^{h}\) and \(b_2 \in \mathbb{R}^{2}\), you will learn a function \( F_{W} \) defined as \[ F_{W} (x) = W_2\sigma(W_1 x + b_1) + b_2 \] where \(\sigma\) is your activation function. In part 1, you should use either of the sigmoid or ReLU activation functions. You will use 32 hidden units (\(h=32\)) and 3072 input units, one for each channel of each pixel in an image (\(d=(32)^2(3) = 3072\)).

Training and Development

With the aforementioned model design and tips, you should expect around 0.84 dev-set accuracy.

Part 2: Modern Network

In this part, you will try to improve your performance by employing more modern machine learning techniques. These include, but are not limited to, the following:
  1. Choice of activation function: Some possible candidates include Tanh, ELU, softplus, and LeakyReLU. You may find that choosing the right activation function will lead to significantly faster convergence, improved performance overall, or even both.
  2. L2 Regularization: Regularization is when you try to improve your model's ability to generalize to unseen examples. One commonly used form is L2 regularization. Let \(\mathcal{R}(W)\) be the empirical risk (mean loss). You can implement L2 regularization by adding an additional term that penalizes the norm of the weights. More precisely, your new empirical risk becomes \[\mathcal{R}(W):= \mathcal{R}(W) + \lambda \sum_{W \in P} \Vert W \Vert_2 ^2\] where \(P\) is the set of all your parameters and \(\lambda\) (usually small) is some hyperparameter you choose. There are several other techniques besides L2 regularization for improving the generalization of your model, such as dropout or batch normalization.
  3. Network Depth and Width: The sort of network you implemented in part 1 is a two-layer network because it uses two weight matrices. Sometimes it helps performance to add more hidden units or add more weight matrices to obtain greater representation power and make training easier.
  4. Using Convolutional Neural Networks: While it is possible to obtain nice results with traditional multilayer perceptrons, when doing image classification tasks it is often best to use convolutional neural networks, which are tailored specifically to signal processing tasks such as image recognition. See if you can improve your results using convolutional layers in your network.
Try to employ some of these techniques in order to attain an approximately 0.87 dev-set accuracy. The only stipulation is that you use under 500,000 total parameters. This means that if you take every floating point value in all of your weights including bias terms, you only use at most 500,000 floating point values.

Some things to look for:

  1. The autograder runs the training process for 500 batches (max_iter=500). This is done so that we have a consistent training process for each evaluation and comparison with benchmarks/threshold accuracies.
  2. You still have one thing in your full control, however—the learning rate. In case you are confident about a model you implemented but are not able to pass the accuracy thresholds on gradescope, you can try increasing the learning rate. It is certainly possible that your model could do better with more training. Be mindful, however, that using a very high learning rate might deteriorate performance as well since the model may begin to oscillate around the optima.

Provided Code Skeleton

We have provided (tar/zip) all the code to get you started on your MP, which means you will only have to implement the PyTorch neural network model.

The only files you will need to modify are neuralnet_part1.py and neuralnet_part2.py.

To learn more about how to run the MP, run python3 mp3.py -h in your terminal.

You should definitely use the PyTorch documentation, linked multiple times on this page, to help you with implementation details. You can also use this PyTorch Tutorial as a reference to help you with your implementation. There are also other guides out there such as this one.


This MP will be submitted via Gradescope; please upload neuralnet_part1.py (for part 1) and neuralnet_part2.py (for part 2).

Extra credit: CIFAR-100 superclasses

For an extra 10% worth of the points on this MP, your task will be to pick any two superclasses from the CIFAR-100 dataset (described in the same place as CIFAR-10) and rework your neural net from part 2, if necessary, to distinguish between those two superclasses. A superclass contains 2500 training images and 500 testing images, so between two superclasses you will be working with 3/5 the amount of data in total (6000 total images here versus 10000 total in the main MP).

You can download the CIFAR-100 data here and extract it to the same place where you've placed the data for the main MP. A custom reader for it is provided here; to use it with the CIFAR-100 data, you should rename this to reader.py and replace the existing file of that name in your working directory.

To set up your code for the extra credit, you must do the following:

The points for the extra credit are distributed as follows: