Assignment Specifications (View All)

Linear Models Assignment

This assignment is worth 17% of your overall Homework and Programming Assignments grade.

In this assignment, you are to implement the simple Logistic Regression algorithm for a single independent variable (IV) and the Perceptron algorithm for the general case of multiple independent variables. You may assume that data is provided to you in 2-column (n-column) format, where the first column(s) is(are) the IV(s) and the second(last) one is the dependent variable (DV). The first row contains the number of independent variables and the second row contains the name(s) of the variables. Each subsequent row corresponds to a single observation. Hence, you may need to aggregate the values of the DV (i.e., counts) for each value of the IV before doing the regression.

  1. Implement Logistic Regression for a single IV.
    • Your program must:
      • Output the values of the intercept (w0) and the slope (w1) of the model.
      • Output the value of R2 from the probabilities (i.e., compute the SSE and TSS on probabilities rather than on odds).
  2. Use your algorithm on this (modified) Coronary Heart Disease (CHD) problem.
    • First, use simple linear regression (no logit function) to build a linear model of the data
      • Report the parameters of the model (w0, w1 and R2).
      • Plot the original (probability) data and graph the linear model your program produced.
      • What does the model predict for the probability of someone 41 years old suffering of CHD?
    • Next, use logistic regression to build a nonlinear model of the data.
      • Report the parameters of the model (w0, w1 and R2).
      • Plot the original (probability) data and graph the nonlinear model your program produced.
      • What does the model predict for the probability of someone 41 years old suffering of CHD?
  3. Generalize your program to handle multiple independent variables by implementing the Perceptron Algorithm.
    • Your program must:
      • Output the values of the perceptron weights (w0, w1, ..., wn).
      • Output the value of R2 (i.e., compute the SSE and TSS for the output of the perceptron)
    • Use your algorithm on the CHD problem.
      • Report the parameters of the model (weights and R2).
      • Plot the SSE over time for 100 epochs of training for a learning rate that is too large (no convergence).
      • Plot the SSE over time for 100 epochs of training for a learning rate that is small enough (asymptotic error).
      • Plot the original (probability) data and graph the nonlinear model your program produced.
      • How do the three models you produced for the CHD problem compare?
    • Use your algorithm on this (modified) Iris problem.
      • Report the parameters of the model (weights and R2).
      • Plot the SSE over time for 1000 epochs of training for a learning rate that is too large (no convergence).
      • Plot the SSE over time for 1000 epochs of training for a learning rate that is small enough (asymptotic error).
      • Report the predictions your model makes for the following data (vectors independent variables).

Turn in a thoughtful, well-written written report (see the guidelines above) that details your experiments and addresses the questions posed above (look carefully at everything to make sure you've covered all the parts of each).