Instructor:Marina Meila
mmp at stat dot washington dot edu
Canvas course site
Optional Textbooks "Mining of Massive Data Sets" by Jure Leskovec, Anand Rajaraman and Jeff Ullman. For each lecture, I will point out the chapters are relevant. The book contents will be supplemented with material in the form of lecture notes.
Lectures: Tuesdays, & Thursdays 10:00--11:20 in Johnson 111 or on zoom
Instructor's office hours: Monday 2:00-2:50 on zoom
TAs: Ronak Mehta and Cheng-Yu Hsieh
Course web page: http://www.stat.washington.edu/courses/548/win22 (this page) will be used to post lecture notes prior to lecture, and homeworks. The resources page is useful for addional reading and software resources. All other materials will be posted on Canvas. Everything posted here that is directly relevant to the learning objectives will be linked to from Canvas, hence you won't need to visit the web site once you are registered.
Class mailing list: multi_cse547a_wi22 at UW will be used sparingly by me mainly for announcements, but is open for posting by everyone on the list.
What will the course be about?
Who is this class for?
Prerequisites
Learning Goals
Syllabus
Grading:The grade is based on homework +
quizzes (75%), project (20%), class
participation (5%). These percentages are approximative, and may change by up to 5%.
With the exception of generic libraries (like plotting, matrix functions) you must write your own code. In particular, you are not to use matlab, R or python code for machine learning that is available on the web or with the textbook(s).
Notice on Zoom class activities
We plan to record the lectures, but not the office hours, or other class activities thay may take place on zoom. If we decide that the learning goals are better achieved by recording, we will announce it in advance.
Any recordings we may make will only be accessible to students enrolled in the course to review materials. These recordings will not be shared with or accessible to the public.
The University and Zoom have FERPA-compliant agreements in place to protect the security and privacy of UW Zoom accounts.
Staying safe from infectious diseases during the in-person lectures; read the information TBPosted . Updates will be made as circumstances change and our experience grows.
Religious accomodation Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or fororganized religious activities. The UW’s policy, including more information about how to requestan accommodation, is available at Religious Accommodations Policy. Accommodations must berequested within the first two weeks of this course using the Religious Accommodations Request Form.
The UW food pantry A student should never have to make the choice between buying food or textbooks. The UW Food Pantry helps mitigate the social and academic effects of campus food insecurity. They aim to lessen the financial burden of purchasing food by providing students withaccess to food and hygiene products at no-cost. Students can expect to receive 4 to 5 days’ worth ofsupplemental food support when they visit the Pantry. For information including operating hours,location, and additional food support resources visit The UW Food Pantry. They can be found onthe North side of West Campus’ Poplar Hall at the corner of Brooklyn Ave NE and 41st.
Last modified: Fri Sep 21 11:20:19 PDT 2018
The course will be about methodological aspects of doing machine learning on big data. Big data allows our models to become gradually more complex, and this is an area called non-parametric statistics.
A fundamental aspect of many non-parametric models is that they depend on the neighbors of a data point. Hence, search for neighbors, or similar items, in large data sets will be an important skill that we will develop.
This class is a core class in the Machine /Big Data PhD Track in Statistics. For any Statistics PhD student who wants to learn Machine Learning/Big Data, this class is part of the triplet of graduate courses 535 --> 538/548. Prerequisites are EITHER CSE 546 (Machine Learning) or STAT 535 (Foundations of Machine Learning). Either prerequisites are accepted to enroll in any of the STAT 548A, B or CSE 547.
Capacity permitting, the class is open to other graduate students with
an interest in statistics, algorithms and computing, in particular to
students involved in Machine Learning and Big Data research across campus.
Format: The course will consist of two weekly lectures, a
series of homework assignments, a few quizzes, and a project. The current plan for this hybrid course is as follows. Please take into account that this plan may change with the dynamics of the pandemic. The instructors and the TAs will make all efforts to ensure continuity and focus on the core learning objective in the unpredictible circumstances.
The TAs will offer optional Tutorials where they will go over certain basic
topics in more depth. In particular, some of the classes in the first part of the course will be partly flipped, and for these I encourage you to participate in the TA office hour.
Lectures
mostly in-person, some on-line (pre-announced). On-line classes will be recorded.
TA tutorials/office hour TBA am on-line
Instructor office hour Mondays 2:00-2:50 on-line
Quizzes during lecture time (about 12 min)
The grading policies described here apply to students in good standing; students who engage in misconduct will be reported to the office of Community Standards & Student Conduct. Their grades may be handled differently.
Submit each homework as a single .pdf file through Canvas (with the exception of programming assignments, which will be discussed below)
The assignments will consist of (1)
programming assignments (typically, to implement a version or a
special case of an algorithm presented in the lecture) to be done in
the programming language of your choice and (2) problems or other
questions, including proofs. The programming assignments will be split
into two separate parts:
Late homeworks will be accepted in exceptional circumstances. Please let us know in advance if you think you will be late.
Teamwork: Each class participant
submits her/his homework individually. Unless explicitly allowed to do
so, you are required to write your own code. Discussing
homework questions is acceptable as long as hints or solutions are not asked for or given; for example, a discussion to clarify what a question requires, on the discussion board or elsewhere, is acceptable. Please list on your homework the names of all the people you discussed with.
On the discussion board: Answer questions on marked as counting for participation asked by instructor, or TAs, or other students. Note: discussion about the homework typically does not qualify for participation, but there are exceptions. For example, if you find an error in the homework and are the first to point it out to us, congratulations! we consider that participation.
How much participation is enough? Once a week on average, either in class or on discussion board, is sufficient for full grade.
Note that class attendance by itself is not graded as participation, and is not required.
You will submit a report describing what you did, submit your code, and test your trained predictor(s) on a test set with hidden labels. Evaluation of the project is based partly on the report and partly on the test set results. Time permitting, we will have short presentations/Q\&A during the last lecture time.
More details about the project will be announced around week 3.