## Lab 7

In this machine problem, you will design an LSTM by hand to perform a specified task. Then you will also train it, using gradient descent, to perform the same task.

## Useful Files

You have a dataset object with input observations, x[t] (loaded as self.observations[t]), and target outputs y[t] (loaded as self.label[t]). The LSTM should create an activation matrix, self.activation. The last column of self.activation (self.activation[t,4]) contains the LSTM output, h[t].

Your task is to create an LSTM that will perform the following task. For every time step, t,

• If self.observation[t]==0, then output h[t]=0.
• If self.observation[t]>=1, then output a count of the number of time steps since the most recent nonzero observation. More precisely,
• If the preceding nonzero observation was observation[s], then output h[t]=t-s.
• If observation[t] is the very first nonzero observation, then output h[t]=t.

## Training Epochs

You'll be tested for epochs -1, 0, 50, and 100. If you run visualize.py, it will run epochs -1 through 140, then print an error convergence curve.

• If epoch==-1, create a model, self.model, with knowledge-based weights that will perform the task perfectly, using a CReLU activation function (g(x)=max(0,min(1,x))).
• If epoch==0, create pseudo-random initial weights. Code for this is provided for you.
• If epoch>0, load an existing JSON model file. Code for this is provided for you.
In all three cases, you should perform one update of gradient descent training. (Hint: if you perform the task perfectly, the error should be zero, and its gradient should also be zero).

## LSTM Definition

We'll use an LSTM defined exactly as in lecture (and on Wikipedia), except that (1) the cell nonlinearity (sigma_h) is the same as the gate nonlinearity (sigma_g), and (2) the error is the mean-squared-error, instead of the sum-squared-error. Thus:

```	    c[t] = f[t]*c[t-1] + i[t]*g(wc*x[t]+uc*h[t-1]+bc)
i[t] = g(wi*x[t]+ui*h[t-1]+bi)
f[t] = g(wf*x[t]+uf*h[t-1]+bf)
o[t] = g(wo*x[t]+uo*h[t-1]+bo)
h[t] = o[t]*c[t]
```
and
```	    error = 0.5*np.sum(np.square(self.activation[:,4]-self.label))
```
where
```	    self.model = np.array([[bc,wc,uc],[bi,wi,ui],[bf,wf,uf],[bo,wo,uo]])
self.activation[t,:] = c[t], i[t], f[t], o[t], h[t]
```
For epoch==-1 (knowledge-based design), use the CReLU activation function, g(x) = max(0,min(1,x)), and limit the weights to [-1,1]. For epoch >= 0 (gradient descent), use the logistic activation function: g(x) = 1/(1+exp(-x)), and the weight values are not limited. These two activation functions are provided for you in the function self.activation(x), and their derivatives are provided in the function self.derivative().

## Files included in the distribution

• setup.sh,requirements.txt -- defines the version of python and numpy
• submitted.py -- skeleton code with comments
• run_tests.py, score.py, tests/test_sequence.py -- run the autograder tests
• debug.py -- run submitted.py for whichever epoch you specify. If a solution has been distributed to you for the corresponding epoch, loads it, and computes the error between the last step of your output and the corresponding step of the distributed solution.
• visualize.py -- make PNG figures that might be useful to help you debug.
• data/* -- training observations and labels
• solutions/* -- complete solutions for epoch0 and epoch1, scoring hash files for many epochs, PNG files computed by visualize.py for the correct solution.

## What to submit:

The file submitted.py, containing all of the functions that you have written.