Project 4: Map Reduce

Computer Architecture I ShanghaiTech University
Project 3 Project 4

Introduction

This project 4 is to some extent a expandation of project 3. Students are required to implement a KNN removal with map reduce in python.

Here we assume students have already been very familiar to KNN removal (Project3), Python (SI100) and spark (Discussion and lab), no extra information will be provided on the web.

Tasks:

Dataset:

Scripts

The whole project can be found in proj4.zip and please put your .bin in ./data

Submission:

Check into autolab: Do NOT check in any of those big test files - you will loose 10% of your score if you (ever) do so!

Grading

We will use a small map to test your program. If your output is incompatible you will receive 0 pts! We will show you the output of your program - keep it short! This way you will have a rough estimate how fast you are compared to the other students. But keep in mind that the autolab is a shared resource, so those values might differ a lot.

Your program will be run on a 10 core (with HyperThreading 20) machine, running Ubuntu 14.04. The speed of each program will be noted.
The slowest 33 percentile and below will receive a score of 80%. The fastest 15 percentile will receive a score of 100%. All other programs will get a score that is linearly scaled between those values.

This project is only worth half of Project 1.1, 1.2, 2.1 or 2.2. (This project 11.11%, other (partial) projects: 22.22%).