CS 335: Machine Learning


Logistics

Lectures: Tues, Thurs 11:30am-12:45pm
Fourth Hour: Fri 8:30am-9:20am
Room: Clapp Laboratory 206
Office hours: Tues 1-3pm, Thurs 9:15-11:15am, Clapp 200
Piazza: https://www.piazza.com/mtholyoke/spring2020/cs335/home
Gradescope: https://www.gradescope.com/courses/76996
Moodle: https://moodle.mtholyoke.edu/course/view.php?id=17913


Learning Goals

The goals of this course are to:
  • Understand the general mathematical and statistical principles that allow one to design machine learning algorithms.
  • Identify, understand, and implement specific, widely-used machine learning algorithms.
  • Learn how to apply and evaluate the performance of machine learning algorithms.

  • More concretely, by the end of the semester, the students will be expected to have mastered fundamental concepts in machine learning including:
  • Derive analytical solutions for mathematical fundamentals of ML (probability, matrix and vector manipulation, partial derivatives, basic optimization, etc.).
  • Derive and implement learning algorithms.
  • Identify and evaluate when an algorithm is overfitting and the relationships between regularization, training size, training accuracy, and test accuracy.

  • Students will be expected to apply machine learning to a project and in the process:
  • Identify real-world problems where machine learning can have impact.
  • Implement machine learning tools on real data and evaluate performance.
  • Produce proficient oral and written communication of technical ideas and procedures.

  • Grading

  • Homeworks (4) — 40%
  • "Celebrations of learning" (2) — 20%
  • Project — 30%
  • Class engagement — 10%


  • The project grade (30%) is further broken down to:
  • Idea proposal — 2.5%
  • Paper selection — 2.5%
  • Literature review — 5%
  • Project plan — 5%
  • Final presentation (poster session) — 5%
  • Final report — 10%

  • Class engagement grades are a composite of non-graded assignments including but not limited to: (a) class discussion, (b) in-class worksheets, (c) posting on Piazza, (d) optional work (like HW0) and fourth-hour work.

    Homework deadlines are strict. For homework that is late, you will be penalized 33% of the assignment’s value for each day or fraction thereof that it is late (0–24 hours = 33% penalty; 24–48 hours = 66% penalty; 48+ hours = no credit). An assignment is considered late until all components (written and code) are submitted.

    There will be two "celebrations of learning." These will be in-class, closed-book exams.

    Course schedule

    week date day topic homework project
    Week 1
    Jan 21 Tu Introduction and Logistics
    Homework 0
    Due: Fri Jan 31 11:59pm
    Jan 23 Th Linear Regression
    4th hour Fri Calculus review
    Week 2
    Jan 28 Tu Gradient descent [code]
    Jan 30 Th Linear algebra for ML [code]
    Homework 1 [files]
    Due: Fri Feb 7 11:59pm
    4th hour Fri Python review [soln]
    Week 3
    Feb 4 Tu Multivariate linear regression
    Feb 6 Th Normal equations and vectorized gradient descent [code]
    4th hour Fri
    Homework 2 [files]
    Due: Fri Feb 14 11:59pm
    Week 4
    Feb 11 Tu Logistic Regression
    Feb 13 Th Evaluating Models
    4th hour Fri Probability and logarithms review
    Homework 3 [files]
    Due: Fri Feb 21 11:59pm
    Week 5
    Feb 18 Tu Overfitting & Regularization [code]
    [End Unit 1]
    Feb 20 Th Multi-Class Classification
    4th hour Fri Review Unit 1
    Week 6
    Feb 25 Tu Project descriptions and Ethics in ML
    Idea Proposal
    Due: Mar 6, 11:59pm
    Feb 27 Th Celebration of Learning 1 (in class)
    4th hour Fri Project brainstorm/discussion
    Homework 4
    Due: Tues Mar 10 11:59pm
    Week 7
    Mar 3 Tu KNN and decision trees
    Mar 5 Th
    4th hour Fri Project brainstorm/discussion
    Week 8
    Mar 10 Tu Kernel Trick
    Paper Selection
    Due: Mar 13 11:59pm
    Mar 12 Th
    4th hour Fri Literature review practice
    Literature Review
    Due: Mar 27 11:59pm
    Week 9
    Mid-semester break: no class
    Week 10
    Mar 24 Tu Neural nets + Backprop
    Mar 26 Th
    4th hour Fri
    Project Plan
    Due: Apr 10 11:59pm
    Week 11
    Mar 31 Tu PCA
    Apr 2 Th
    4th hour Fri
    Week 12
    Apr 7 Tu Community Day (no classes)
    Apr 9 Th Bayesian classification
    4th hour Fri Clustering
    Week 13
    Apr 14 Tu
    Project Final Report
    Due: Apr 28 11:59pm
    Apr 16 Th [End Unit 2]
    4th hour Fri Review
    Week 14
    Apr 21 Tu Project Work Day
    Apr 23 Th Celebration of Learning 2 (in class)
    4th hour Fri Project Work Day
    Week 15
    Apr 28 Tu Project Final Presentations

    Project

    Projects will focus on "impactful" applications of machine learning focused on three areas: (1) health, (2) education, or (3) climate change. Project details will be posted later in the course.

    Resources

    This course has no official textbook. However, some of lecture material will be drawn from the following resources and they could be helpful in your learning:
  • An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani: an accessible undergraduate machine learning textbook with statistics focus.
  • Course handouts from Stanford CS 229 by Andrew Ng
  • Python

    Programming assignments will use Python, NumPy, and SciPy. The required Python environment is Anaconda 2019.10 distribution of Python 3.7. If you are not working with this environment, your work will not be graded and you will not receive help debugging.

    See this page on getting started with Python for CS 335.

    Resources for Python:
  • Google's Python class
  • Norm Matloff’s Fast Lane to Python
  • Stanford CS 231 Python Numpy Tutorial
  • Stanford CS 231 IPython tutorial

  • Academic Honesty

    The Computer Science Department follows the Mount Holyoke College Honor Code. Work submitted for grading must be entirely your own, unless you were instructed to work in groups. The purpose of course assignments is to practice skills, gain a deeper understanding of the course material, and apply that knowledge to new situations. Assignments are designed to challenge you, stimulate critical thinking, and help you understand the concepts related to the course. Your grade is a reflection of your understanding of the material. We recognize that collaboration can help you master course material. In fact, there are certain ways in which we will encourage you to collaborate. These include: discussing course content at a high level, getting hints or debugging help, talking about problem-solving strategies, and discussing ideas together. However, you must do all coding and write-ups on your own. Writing code and solutions on your own will test and demonstrate your mastery of course material. Looking at solutions from other students or any other source (including the web), or collaborating to write solutions to individual work, is considered a violation of the honor code. All suspected violations will be referred to the academic honor board. If you are uncertain whether something is allowed, it is your responsibility to ask. If you have engaged in any of the above acceptable collaboration activities for an assignment, you MUST acknowledge the classmates or TAs with whom you spoke – this should be done in a comment at the top of your main submission file. Note that the Association for Computing Machinery has a strong Code of Ethics and Professional Conduct. At this site you can read the new 2018 version.

    Do:

  • Organize study groups.
  • Clarify ambiguities or vague points in class handouts, textbooks, assignments, and labs.
  • Discuss assignments at a high level to understand what is being asked for, and to discuss related concepts and the high-level approach.
  • Refine high-level ideas/concepts for projects (i.e. brainstorming).
  • Outline solutions to assignments with others using diagrams or pseudocode, but not actual code.
  • Walk away from the computer or write-up to discuss conceptual issues if you get stuck.
  • Get or give help on how to operate the computer, terminal, or course software.
  • Get or give limited debugging help. Debugging includes identifying a syntax or logical error but not helping to write or rewrite code.
  • Submit the result of collaborative coding work if and only if group work is explicitly permitted (or required).
  • Don't:

  • Look at another student’s solutions.
  • Use solutions to same or similar problems found online or elsewhere.
  • Search for homework solutions online.
  • Turn in any part of someone else's work as your own (with or without their knowledge).
  • Share your code or written solutions with another student.
  • Share your code or snippets of your own code online.
  • Save your work in a public place, such as a public github repository.
  • Allow someone else to turn in your work as their own. (Be sure to disconnect your network drive when you logout and remove any printouts promptly from printers.)
  • Collaborate while writing programs or solutions to written problems. (But see above about specific ways to give or get debugging help.)
  • Write homework assignments together unless it is specified as a group assignment.
  • Collaborate with anyone outside your group for a group assignment.
  • Use resources during a quiz or exam beyond those explicitly allowed in the quiz/exam instructions. (If it is not listed, don’t use it. Ask if you are unsure.)
  • Submit the same or similar work in more than one course. (Always ask the instructor if it is OK to reuse any part of a different project in their course.).

  • Inclusion and Equity

    The instructor and students in CS 335 are expected to be respectful, inclusive of all students, and to not discriminate. Mount Holyoke resources on diversity, equity, and inclusion can be found here. Bias incidents can be reported here. Students are encouraged to bring concerns or feedback to the attention of the instructor.

    Accommodations

    If you have a disability and would like to request accommodations, please contact AccessAbility Services, located in Wilder Hall B4, at (413) 538-2634 or accessability-services@mtholyoke.edu. If you are eligible, they will give you an accommodation letter which you should bring to the instructor as soon as possible.

    Communication Policy

    The instructor will respond to email and Piazza in a timely manner Monday through Friday from 9am-5pm. Communication during evenings and weekends will not be answered until the following business day except in extreme circumstances. Therefore, start homework assignments and projects early to give yourself enough time ask questions and receive answers.