Instructor:Marina Meila
mmp at stat dot washington dot edu
Canvas course site
Optional Textbooks "Mining of Massive Data Sets" by Jure Leskovec, Anand Rajaraman and Jeff Ullman. For each lecture, I will point out the chapters are relevant. The book contents will be supplemented with material in the form of lecture notes.
Lectures: Mondays, & Fridays 2-3:20 in MLR 301
Instructor's office hours: TBD on zoom ??
TA: Medha Agarwal
Course web page: http://www.stat.washington.edu/courses/548/sp25 (this page) will be used to post lecture notes prior to lecture, and homeworks. The resources page is useful for addional reading and software resources. All other materials will be posted on Canvas. Everything posted here that is directly relevant to the learning objectives will be linked to from Canvas, hence you won't need to visit the web site once you are registered.
Class mailing list: multi_cse547a_wi22 at UW will be used sparingly by me mainly for announcements, but is open for posting by everyone on the list.
What will the course be about?
Who is this class for?
Prerequisites
Learning Goals
Syllabus
Grading:The grade is based on homework +
quizzes (approx 75%), project (approx 20%), class
participation (approx 5%). These percentages are approximative, and may change by up to 5%.
With the exception of generic libraries (like plotting, matrix functions) you must write your own code. In particular, you are not to use matlab, R or python code for machine learning that is available on the web or with the textbook(s).
Religious accomodation Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or fororganized religious activities. The UW’s policy, including more information about how to requestan accommodation, is available at Religious Accommodations Policy. Accommodations must berequested within the first two weeks of this course using the Religious Accommodations Request Form.
The UW food pantry A student should never have to make the choice between buying food or textbooks. The UW Food Pantry helps mitigate the social and academic effects of campus food insecurity. They aim to lessen the financial burden of purchasing food by providing students withaccess to food and hygiene products at no-cost. Students can expect to receive 4 to 5 days’ worth ofsupplemental food support when they visit the Pantry. For information including operating hours,location, and additional food support resources visit The UW Food Pantry. They can be found onthe North side of West Campus’ Poplar Hall at the corner of Brooklyn Ave NE and 41st.
Last modified: Fri Sep 21 11:20:19 PDT 2018
The course will be about methodological aspects of doing machine learning on big data. Big data allows our models to become gradually more complex, and this is an area called non-parametric statistics.
A fundamental aspect of many non-parametric models is that they depend on the neighbors of a data point. Hence, search for neighbors, or similar items, in large data sets will be an important skill that we will develop.
This class is a core class in the Machine /Big Data PhD Track in Statistics. For any Statistics PhD student who wants to learn Machine Learning/Big Data, this class is part of the triplet of graduate courses 535 --> 538/548. Prerequisites are EITHER CSE 546 (Machine Learning) or STAT 535 (Foundations of Machine Learning). Either prerequisites are accepted to enroll in any of the STAT 548A, B or CSE 547.
Capacity permitting, the class is open to other graduate students with
an interest in statistics, algorithms and computing, in particular to
students involved in Machine Learning and Big Data research across campus.
Format: The course will consist of two weekly lectures, a
series of homework assignments, a few quizzes, and a project.
The TA will offer optional Tutorials where they will go over certain basic
topics in more depth. In particular, some of the classes in the first part of the course will be partly flipped, and for these I encourage you to participate in the TA office hour.
The grading policies described here apply to students in good standing; students who engage in misconduct will be reported to the office of Community Standards & Student Conduct. Their grades may be handled differently.
Submit each homework as a single .pdf file through Canvas (with the exception of programming assignments, which will be discussed below)
The assignments will consist of (1)
programming assignments (typically, to implement a version or a
special case of an algorithm presented in the lecture) to be done in
the programming language of your choice and (2) problems or other
questions, including proofs. The programming assignments will be split
into two separate parts:
Late homeworks will be accepted in exceptional circumstances. Please let us know in advance if you think you will be late.
Teamwork: Each class participant
submits her/his homework individually. Unless explicitly allowed to do
so, you are required to write your own code. Discussing
homework questions is acceptable as long as hints or solutions are not asked for or given; for example, a discussion to clarify what a question requires, on the discussion board or elsewhere, is acceptable. Please list on your homework the names of all the people you discussed with.
On the discussion board: Answer questions on marked as counting for participation asked by instructor, or TAs, or other students. Note: discussion about the homework typically does not qualify for participation, but there are exceptions. For example, if you find an error in the homework and are the first to point it out to us, congratulations! we consider that participation.
How much participation is enough? Once a week on average, either in class or on discussion board, is sufficient for full grade.
Note that class attendance by itself is not graded as participation, and is not required.
You will submit a report describing what you did, submit your code, and test your trained predictor(s) on a test set with hidden labels. Evaluation of the project is based partly on the report and partly on the test set results. Time permitting, we will have short presentations/Q\&A during the last lecture time.
More details about the project will be announced around week 3.