Project 4: Map Reduce
Project 3 Project 4
Introduction
This project 4 is to some extent a expandation of project 3. Students are required to implement a KNN removal with map reduce in python.
Here we assume students have already been very familiar to KNN removal (Project3), Python (SI100) and spark (Discussion and lab), no extra information will be provided on the web.
Tasks:
- KNN removal with map reduce
- run your code on computer cluster(Not been provided yet)
Dataset:
- Download from Project 3
Scripts
- utils/io.py, for bin2nparray and nparray2bin
- utils/knnRemoval.py, reimplement of reference in Project 3
- utils/mapreduceKnnRemoval.py, your map reduce implementation
- demo.py, you can follow result from demo.py as ground truth
- Image comparison and plot scripts will not be provided
The whole project can be found in proj4.zip and please put your .bin in ./data
Submission:
Check into autolab:- mapreduceKnnRemoval.py: Your fast implementation.
Grading
We will use a small map to test your program. If your output is incompatible you will receive 0 pts! We will show you the output of your program - keep it short! This way you will have a rough estimate how fast you are compared to the other students. But keep in mind that the autolab is a shared resource, so those values might differ a lot.
Your program will be run on a 10 core (with HyperThreading 20) machine, running Ubuntu 14.04. The speed of each program will be noted.
The slowest 33 percentile and below will receive a score of 80%. The fastest 15 percentile will receive a score of 100%. All other programs will get a score that is linearly scaled between those values.
This project is only worth half of Project 1.1, 1.2, 2.1 or 2.2. (This project 11.11%, other (partial) projects: 22.22%).