CS335 Projects

The CS335 course project is your opportunity to apply state-of-the-art machine learning methods to one of three "impactful" application areas: (A) Health, (B) Education, or (C) Climate Change.

Aside from intrinsic benefit, course projects are an opportunity to demonstrate your machine learning knowledge and potentially add to a portfolio you can use for job or graduate school applications. Previous students have even submitted outstanding course projects for publication!

Logistics

Project groups will have minimum 2 students and maximum 3 students.

Students are expected to contribute equally to the project. The amount of work should scale to the number of students in the group and will be graded as such (groups with 3 students will be expected to produce more work than groups with 2 students).

Examples and resources will be focused on the three "impactful" application areas. However, students may propose a project that is outside these three areas if they can convince the instructor it is also impactful.

Grading

The project grade (30%) is further broken down to:

Idea proposal — 2%

Paper and group selection — 2%

Literature review — 5%

Weekly reports (4) — 8%

Final report — 13%

(1) Idea Proposal (Due: March 6, 11:59pm)

Every student must submit their own individual idea proposal (even if you already know your group members). The first step for every project is to generate as many ideas as possible.

Your idea proposal must contain several sentences for each of the following questions:

What is your research question?
What are 1-3 datasets that could be used to answer this research question?
What machine learning models/methods do you plan to use (e.g. logistic regression, neural networks, clustering)?

For the idea proposal, you will submit in two places:

Gradescope: Submit as a pdf.

Piazza: Submit as a post under the idea_proposal tab. You must use your name (do not post anonymously).

You will submit your idea proposal publicly so that other students can see and comment on your proposed idea. Proposals will be graded based on completion.

(2) Form groups (Before March 13)

After idea proposals are turned in March 6, read all the other students posts on Piazza. Message or talk with other students to form groups before the paper selection submission on March 13.

(3) Paper and group selection (Due: March 13, 11:59pm)

A useful resource to search for relevant papers is Google Scholar.

For this submission, please answer the following questions:

What are the names of your group members?
What is your group's research question?
Let the number of members of your group be equal to n. List 4*n papers that relate to your research question (Note: you should skim but NOT fully read these papers). For each paper, provide the title, author, publication venue, and url.

Submit this assignment as a group to Gradescope. (Instructions for submitting an assignment as a group on Gradescope are here.)

(4) Literature Review (Due: March 27, 11:59pm)

Given your selected papers (Step 3), skim the papers again. You should have gathered more papers than you actually plan to read. Select 2*n papers (where n is the number of group members) to read thoroughly.

For each paper chosen, answer the following questions:

What venue was the paper published in? Who are the authors? What are their backgrounds? How many times has the paper been cited? (~3-5 sentences)
What problem are the authors trying to solve? (~2 sentences)
Why is the problem important (2-5 sentences)?
What mathematical notation do the authors use and what does this notation mean? This is often most useful represented as a table.
Make a list of five terms that you do not understand from your paper. For each, do a bit of research and write a brief description of the term. (∼2 sentences each.)
What datasets did the authors use (if any)? Why did they use these datasets? (~2 sentences)
What machine learning models did the authors use (if any)? Why did they justify using these? (~2 sentences)
How did the authors know their approach was successful? (~10 sentences)
1. What evaluation metrics did they use? What do these metrics mean?
2. What baselines did they compare against?
3. Can you identify a problem with the authors' measure of success?
Research papers are almost always an improvement, reaction or twist on other research (“prior work”) that others have done before. Of all of the works cited in your paper, what seems to be the most important cited prior work? Explain why that citation seems most important. (4-6 sentences)
Research papers almost always make use of tools, methods and algorithms that have been developed by others. Of all of the works cited in your paper, what citation of a tool, method or algorithm seems most important? Explain why. (4-6 sentences)
Pick one equation from the paper and explain what it means using a mix of prose and mathematical notation (~5 sentences).
How is your project similar and different from this paper? (4-5 sentences)

As a group, submit this as a single pdf document to Gradescope.

(4) Weekly reports (Due: April 3, 10, 17, 24 11:59pm)

After the literature review, you will submit four weekly reports. The purpose of the reports is to make sure you are on track with your project and uncover any major problems so you can quickly change course if necessary. The reports should answer the following questions:

What activities did you do this week?
What preliminary results do you have?
What issues have you encountered?
What questions do you have?
What are concrete steps (for next week)?
What are longer-term todo items?

You will submit in two places:

Gradescope: As a group, submit a single pdf document .

Piazza: Submit as a post under the weekly_report_k tab (where k is the week). The purpose of public posts are for you to learn from your classmates and share information.

(5) Final Report (Due: May 4 at 11:59pm)

There will be no project presentations (due to COVID-19). The final report will count for 13% of the student's grade. One grade will be assigned to the entire project group.

The goal of the final report is to have students practice technical writing in order to convey the contributions of their project to a audience with an adequate machine learning background. Reports will be judged both on their technical quality (mastery of machine learning concepts) and writing proficiency.

Here is the rubric for the final report:

The final report should consist of a 8-12 page document. (If the report is not in this page range the project grade will be reduced by 10%).
The report must be submitted as a pdf to Gradescope. It must be written with LaTex using the NeurIPS 2019 style files (5% adherence to style requirements).
The report must consist of the following sections (5% adherence to section requirements):
- Abstract. At a high level, summarize what your problem is, what methods you used, and your results. An abstract that is shorter and more concise is better.
- Introduction. What is your problem? Why is it important?
- Related work. What have people previously done in regards to your problem? What work is related? (This is a great place to summarize some of the papers you wrote about in-depth for the Literature Review assignment).
- Dataset. Describe what dataset(s) you are using, where these came from, and some basic properties of the dataset.
- Methods. What methods are you using? What machine learning models are you using and why?
- Results and evaluation. What are your results? How did you evaluate these results?
- Conclusion. What can you conclude from your project? What did you learn? What are future directions? Are there any real-world implications from your work?
- References (bibliography). List citations here. Use the NeurIPS style file for examples of how to cite certain works.
Projects must adequately cite at least 8 research articles (5%).
Additionally, the report must also contain the following:
- At least two figures. These figures could show the results, interesting analyses, exploration of the features, an overview of the data and modeling pipeline, etc. (20%)
- At least one table. The table could consist of data statistics, results, etc. (10%)
Writing mechanics: grammar and typos (10%).
Writing clarity: high-level writing style and arguments conveyed effectively (10%).
Mastery of machine learning concepts (30%). Evaluation on this category could include (but is not limited to):

Proper train-test split (or train-dev-test split or cross-validation)
Proper data-driven selection of hyperparamters
Comparing models against a baseline (e.g. predicting the majority class)
Comparing more than one machine learning model
Proper use of a machine learning package (e.g. sklearn or Pytorch) or development of a new machine learning model.