Project 3: Performance Programming

Computer Architecture I ShanghaiTech University
Project 2.2 Project 3 Project 4


Please read to last years Project 3 description. It was about filling missing pixels in a 3D depth image: Project 3 2016. SIST has a 3D laser scanner (Faro Focus 3D X330) that can also gather the color of the 3D points. More information on 3D scanning can be found on Wikipedia.
For this project we want to post-process such scans. We will be working with a scan taken in the lobby of the new SIST building (see Fig. 1). This scan has a resolution of 20265 x 8533 pixel - that are about 173 million pixel. A video showing this and 7 more scans in the lobby is available here (299MiB).

Since this is a 3D scan we not only get the colors of the pixel but also the distance of that pixel from the camera. For this project, this information is encoded in a separate file which first saves the width and height of the depth image as 32bit unsigned integers (little-endian) followed by the depth information as floats in meter. The file can be downloaded below.

Below also a tool is available to convert such a depth image to a png. Figure 2 shows the depth image of the raw input scan. You may notice that there are many pixels (more than a million) which are black. For those pixels the scanner could not determine the distance.

It should be mentioned that far away points are more likely to get removed, because their neighbors are more far away - thus the algorithm might not be an ideal one. But it's good for a CA project.

Fig. 1: The scan of the lobby of the new SIST building in color. Full resolution (154MiB).

Fig. 2: The depth image of that scan capped at 60m. The darker the closer to the camera. Completely black pixels are error points. Full resolution (7MiB).

Fig. 3: Depth image afer KNN removal (30 neighbors, distance threashold 1.5m, kernel size 15). Computation time on Sören's Mac: 14.7 minutes. Number of outliers: 51,157. Full resolution (7MiB).

Your Job:

There are outlier points in the cloud. Those are points with very few or no neighbors. Those outliers can be annoying, we will be using KNN to remove them. KNN is K nearest neighbor removal. DO NOT WIKI IT as that would be K nearest neighbor clustering. Don't worry, we provide the reference implementation. K nearest neighbor removal will rule pixels as outliers if they are too far away from there nearest neighbors. Here is (a similar) algorithm:

    class Image{
        void kNearestNeighborRemoval(const int, const int);
        std::map<Pixel, int> pixels_dist;
        PixelList getKNearestNeighbors(const Pixel&, const int);
        void resetPixelDepth(const Pixel&);
        // ... Some other stuff ...

    void Image::kNearestNeighborRemoval(const int k, const int thr_dist){
        for (Pixel p = this->begin(); p != this->end(); p++){
            PixelList pl = this->getKNearestNeighbors(p, k);
            int avg_dist = pl.takeAverage();
            if (avg_dist > thr_dist) {

    double Image::getNearestNeighborNotInList(PixelList &list, const Pixel& center){
        double dist = std::numeric_limits::max();
        Pixel candidate;
        for (Pixel p = this.begin(); p != this.end(); p++){
            if(p.valid() && list.notInList(p) && center.calculateDistance(p) < dist){
                candidate = p;
                dist = center.calculateDistance(p);
        return candidate;

    PixelList Image::getKNearestNeighbors(const Pixel& center, const int k){
        PixelList pl(k);
        Pixel nearest;
        for (int i = 0; i< k; i++){
            nearest = getNearestNeighborNotInList(pl, center);
        return pl;

    void Image::resetPixelDepth(Pixel& p){

Below are some notices and details we will be using to implement this algorithm:

Important Notice! The code above is just a demo showing you the most important core of this algorithm, you are not obligated to use OOP (though its a good practice). In fact, the reference code doesn't use OOP either. Don't constrain yourselves to the reference code, you are more than welcome to implement it from scratch.

Reference Program:




In the computer architecture lecture we have learned several techniques to make programs faster. Use them to implement the algorithm as fast as possible. You can change whatever you like - it is only important that your output is the same as the reference program. You may not use any specialized math library - SSE and openMP should be enough.

Do NOT set the number of threads of openMP by hand - it will be set by the environment variable! We will punish you if you do.

The simplest approach is to take the reference implementation and start changing/ optimizing it.

Take care of what makes sense to optimize and what not. For example the reading and writing of the images is not part of the timing - so you do not need to change anything there.


As a tar file please check into autolab: Do NOT check in any of those big test files - you will loose 10% of your score if you (ever) do so!


We will use a small map to test your program. If your output is incompatible you will receive 0 pts! We will show you the output of your program - keep it short! This way you will have a rough estimate how fast you are compared to the other students. But keep in mind that the autolab is a shared resource, so those values might differ a lot.

Your program will be run on a 10 core (with HyperThreading 20) machine, running Ubuntu 14.04. The speed of each program will be noted.
The speed of each program will be noted.
The slowest 33 percentile and below will receive a score of 80%. The fastest 15 percentile will receive a score of 100%. All other programs will get a score that is linearly scaled between those values.