CS335 Projects

The CS335 course project is your opportunity to apply state-of-the-art machine learning methods to one of three "impactful" application areas: (A) Health, (B) Education, or (C) Climate Change.

Aside from intrinsic benefit, course projects are an opportunity to demonstrate your machine learning knowledge and potentially add to a portfolio you can use for job or graduate school applications. Previous students have even submitted outstanding course projects for publication!

Logistics

  • Project groups will have minimum 2 students and maximum 3 students.
  • Students are expected to contribute equally to the project. The amount of work should scale to the number of students in the group and will be graded as such (groups with 3 students will be expected to produce more work than groups with 2 students).
  • Examples and resources will be focused on the three "impactful" application areas. However, students may propose a project that is outside these three areas if they can convince the instructor it is also impactful.
  • Grading

    The project grade (30%) is further broken down to:
  • Idea proposal — 2%
  • Paper and group selection — 2%
  • Literature review — 5%
  • Weekly reports (4) — 8%
  • Final report — 13%
  • (1) Idea Proposal (Due: March 6, 11:59pm)

    Every student must submit their own individual idea proposal (even if you already know your group members). The first step for every project is to generate as many ideas as possible.

    Your idea proposal must contain several sentences for each of the following questions:
    1. What is your research question?
    2. What are 1-3 datasets that could be used to answer this research question?
    3. What machine learning models/methods do you plan to use (e.g. logistic regression, neural networks, clustering)?
    For the idea proposal, you will submit in two places:
  • Gradescope: Submit as a pdf.
  • Piazza: Submit as a post under the idea_proposal tab. You must use your name (do not post anonymously).

  • You will submit your idea proposal publicly so that other students can see and comment on your proposed idea. Proposals will be graded based on completion.

    (2) Form groups (Before March 13)

    After idea proposals are turned in March 6, read all the other students posts on Piazza. Message or talk with other students to form groups before the paper selection submission on March 13.

    (3) Paper and group selection (Due: March 13, 11:59pm)

    A useful resource to search for relevant papers is Google Scholar.

    For this submission, please answer the following questions:
    1. What are the names of your group members?
    2. What is your group's research question?
    3. Let the number of members of your group be equal to n. List 4*n papers that relate to your research question (Note: you should skim but NOT fully read these papers). For each paper, provide the title, author, publication venue, and url.
    Submit this assignment as a group to Gradescope. (Instructions for submitting an assignment as a group on Gradescope are here.)

    (4) Literature Review (Due: March 27, 11:59pm)

    Given your selected papers (Step 3), skim the papers again. You should have gathered more papers than you actually plan to read. Select 2*n papers (where n is the number of group members) to read thoroughly.

    For each paper chosen, answer the following questions:
    1. What venue was the paper published in? Who are the authors? What are their backgrounds? How many times has the paper been cited? (~3-5 sentences)
    2. What problem are the authors trying to solve? (~2 sentences)
    3. Why is the problem important (2-5 sentences)?
    4. What mathematical notation do the authors use and what does this notation mean? This is often most useful represented as a table.
    5. Make a list of five terms that you do not understand from your paper. For each, do a bit of research and write a brief description of the term. (∼2 sentences each.)
    6. What datasets did the authors use (if any)? Why did they use these datasets? (~2 sentences)
    7. What machine learning models did the authors use (if any)? Why did they justify using these? (~2 sentences)
    8. How did the authors know their approach was successful? (~10 sentences)
      1. What evaluation metrics did they use? What do these metrics mean?
      2. What baselines did they compare against?
      3. Can you identify a problem with the authors' measure of success?
    9. Research papers are almost always an improvement, reaction or twist on other research (“prior work”) that others have done before. Of all of the works cited in your paper, what seems to be the most important cited prior work? Explain why that citation seems most important. (4-6 sentences)
    10. Research papers almost always make use of tools, methods and algorithms that have been developed by others. Of all of the works cited in your paper, what citation of a tool, method or algorithm seems most important? Explain why. (4-6 sentences)
    11. Pick one equation from the paper and explain what it means using a mix of prose and mathematical notation (~5 sentences).
    12. How is your project similar and different from this paper? (4-5 sentences)

    As a group, submit this as a single pdf document to Gradescope.

    (4) Weekly reports (Due: April 3, 10, 17, 24 11:59pm)

    After the literature review, you will submit four weekly reports. The purpose of the reports is to make sure you are on track with your project and uncover any major problems so you can quickly change course if necessary. The reports should answer the following questions:
    1. What activities did you do this week?
    2. What preliminary results do you have?
    3. What issues have you encountered?
    4. What questions do you have?
    5. What are concrete steps (for next week)?
    6. What are longer-term todo items?

    You will submit in two places:
  • Gradescope: As a group, submit a single pdf document .
  • Piazza: Submit as a post under the weekly_report_k tab (where k is the week). The purpose of public posts are for you to learn from your classmates and share information.

  • (5) Final Report (Due: May 4 at 11:59pm)

    There will be no project presentations (due to COVID-19). The final report will count for 13% of the student's grade. One grade will be assigned to the entire project group.

    The goal of the final report is to have students practice technical writing in order to convey the contributions of their project to a audience with an adequate machine learning background. Reports will be judged both on their technical quality (mastery of machine learning concepts) and writing proficiency.

    Here is the rubric for the final report:


    Resources

    Health

  • Academic Paper: Improved protein structure prediction using potentials from deep learning
  • Academic Paper: A distributional code for value in dopaminebased reinforcement learning
  • Machine Learning for Healthcare Conference
  • Blog1 and Blog2, listing free datasets for healthcare
  • 538 Politics, Who's Using Drugs Dataset

  • Education

  • Data Mining for Education
  • International Conference on Intelligent Tutoring Systems
  • Data Mining for Education
  • Berkeley Algorithms and Computing for Education Lab
  • World Bank Education Statistcs
  • Educational Process Mining Dataset
  • Prof. Andrew Lan's work

  • Climate Change

  • Rolnick et al. 2019, Tackling Climate Change with Machine Learning and accompanying resource page
  • ICML 2019 Workshop Climate Change: How Can AI Help?
  • NeurIPS 2019 Workshop Tackling Climate Change with Machine Learning
  • Federal Climate Change datasets
  • Mt. Holyoke College Weather Dataset

  • Other resources for ideas

  • Bloomberg Data for Good Exchange
  • Data Science for Social Good