Project 4: Map Reduce

Computer Architecture I ShanghaiTech University
Project 3 Project 4

Introduction

This project 4 is to some extent a expansion of project 3. Students are required to implement a KNN removal with map reduce in python.

Here we assume students have already been very familiar to KNN removal (Project3), Python (SI100) and spark (Discussion and lab), no extra information will be provided on the web.

Tasks:

Dataset:

Scripts

There's one minor difference between the cpp and python version on mean computation. Cpp version will divide by k anyway while python version takes the true mean. With the same setting, 30_1.5_15, on cropped.bin, cpp version holds 73908 zeros while python version result holds 73912 zeros.

The whole project can be found in proj4.zip and please put your .bin in ./data . (Updated June 7)

Submission:

Check into autolab: Do NOT check in any of those big test files - you will loose 10% of your score if you (ever) do so!

Grading

We will use a small map to test your program. If your output is incompatible you will receive 0 pts! We will show you the output of your program - keep it short! This way you will have a rough estimate how fast you are compared to the other students. But keep in mind that the autolab is a shared resource, so those values might differ a lot.

Your program will (hopefully) be run on a big cluster with many nodes. The speed of each program will be noted.
The slowest 33 percentile and below will receive a score of 80%. The fastest 15 percentile will receive a score of 100%. All other programs will get a score that is linearly scaled between those values.