Report outline
- Data exploration. What data properties influenced your choice of NN method?
- Methods(s): Describe the methods you used, in enough detail to make the work reproducible. Assume that the readers know the methods already (e.g. you do not need to describe the LSH method) but they do not know what variant of the method you used, and what parameters. In other words, your report need not reproduce course materials, they need to complement them with the specifics of what you did. For full grade, please define any parameters which are not defined in the course materials.
- Strategy. Reproducible description of how you traded off time, accuracy, effort from your part, etc. What was your reasoning? If you tried multiple approaches, what worked and what failed? Be explicit in describing the kind of failure (or success) you observed. Note: many of you encountered problems with Colab. Mention these since they influenced your choice and your results; but remember they are not the focus of the project.
- Experimental results. Some of these should reflect/intersect with training strategy. This part is your analytical work to validate the methods you developed. The results should go beyond running your method on the given test set.
For example, describe what experiment s you did to optimize your method. How did the results influence your final algorithm?
Extract featurs of your methods or data that are interpretable and plottable. Be selectivein what you show! Credit will given for careful analysis or visualization of the results.
- Estimate of the average retrieval metrics: time, false positive and false negative rate . Optionally, prediction intervals where you believe your metrics will be, and how you estimated these.
- Optional: references
Total length: no more than 5 pages of contents, with extra pages containing references or figures, up to no more than 10 pages total. Do not include code in report.
In writing the report, assume that the readers (=instructor and TA) are very familiar with all the methods and with CS/machine learning terminology; there is no need to reproduce textbook like defintions (and there would be no space for it). What the readers need to know are the specifics of what you did with these methods and why. What parameters you used, what inputs (if applicable), and if there were any variations from the standard methods. For example, if you use a software package, assume we don't know what variant of the "textbook" method the package implements, or what the parameters mean. You need to specify these in your report.
How to submit your query set results
- Instructions t.b.posted on canvas
-
We will use a script to download and evaluate your results, so please do not vary from the format and file name, lest your results be distorted.
- Scoring of your results is described on canvas (see Project module). It takes into account the rates of false positives, false negatives and time elapsed.
Time line TBD