## Project 3 - CACoin Mining Hash Function Optimization

Project 2.2 Project 3 Project 4

The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.

In this project, we hope you can use all knowledge about computer architecture that your have learnt in this course to optimize the SHA256 hash function used in the homework 5 CACoin mining program.

Recall that in HW5, you are speeding up the CACoin mining by parallelize the process of finding the approporiate nonce. But some of you may have found that the reason why the mining process is slow is not just that the the process of finding the approporiate nonce, but also the naive, unoptimized SHA256 calculation process. In this project, you are going to use the knowledge you learnt in CA to speed up the naive SHA256 implementation.

### Getting started

Make sure you read through the entire specification before starting the project.

You will be using gitlab to collaborate with your group partner. Autolab will use the files from gitlab. Make sure that you have access to gitlab. In the group CS110_Projects you should have access to your project 3 project. Also, in the group CS110, you should have access to the p3_framework.

1. Clone your p3 repository from GitLab.
2. In the repository add a remote repo that contains the framework files: git remote add framework https://autolab.sist.shanghaitech.edu.cn/gitlab/cs110/p3_framework.git
3. Go and fetch the files: git fetch framework
4. Now merge those files with your master branch: git merge framework/master
5. The rest of the git commands work as usual.

#### Files

This project contains the same set of files as HW5. The SHA256 implementation you are optimizing is in hash_functions/sha256.c.

### Optimization Techniques

The SHA256 algorithm is rigorous to be understood by someone without background knowledge in cryptographic. But one can find many obvious optimizations in the implementation we provide, with what you have learnt in Computer Architecture. We are listing some of the possible approaches below:

#### Compiler

There are some optimization flags that can be turned on in GCC. The available flags for x86 ISA from GCC is listed here.

The first and the easiest approach is to use multithreading to optimize this algorithm, with either pthread or openmp.

#### SIMD instructions

Part of this algorithm is also a good candidate for SIMD instructions. You can also think of changing the input and output of the hash functions to make use of SIMD instructions while keeping the correctness of SHA-256 evaluation.

#### Loop unrolling

Loop unrolling works very well in combination with SIMD instructions for this algorithm, and you should think about it.

#### Cache Blocking

Part of this algorithm is also a good candidate for cache blocking. But the optimization is not so obvious.

1. You are allowed to re-use what you have implemented in HW5 to accelerate the mining process.
2. We will first run your code on small test cases on Autolab. If you program produces the correct result, you receive 60% points. Memory leak check with valgrind is also included in this stage.
3. After the due, we will test your code on large test cases. Your grade on this part depends on the execution time of your code. We will mine a loooooot of CACoins there, and take the weighted average of mining time regarding the difficulty parameters. If you have earned the 60% in step 1, the rest 40% will be given based on the performance of your algorithm on the strong server.
4. If your code should not crash on either stage, or you will receive no point in that stage.
5. Your submission must contain meaningful use of OpenMP and Intel SIMD intrinsics. Otherwise, you will get 0 point from both stages. This check will be done manually after the deadline so there will be no feedback on this from the auto-grader.

There are something that you need to keep in mind: The SHA256 algorithm is used in almost everywhere, by anybody, in any time in modern society. So, the algorithm has many very fast implementations available. You should not submit any existing implemantations that is not written by you. But you can refer to some technical reports and articles for the algorithms and optimizations available and implement your own. Also, you CANNOT use the built-in instruction set dedicated to accelerate SHA-256, including Intel AES-NI or Intel SHA extensions such as SHA256RNDS2, SHA256MSG1, SHA256MSG2, etc.

### Submission

When your project is done, please submit all the files including the framework to your remote GitLab repo by running the following commands.

$git commit -a$ git push origin master:master


Autolab will discard all other files except for blockchain.c, hash_functions.c, hash_functions/sha256.c, hash_functions/sha256.h and Makefile.

#### How to Autolab

Similar to previous projects, upload your autolab.txt to Autolab to submit your project.

#### Submission Time Announcement

The last time of your submission to the git repo will count towards your submission time - also with respect to slip days. So do not commit to this git after the due date, unless you want to use slip days or are OK with getting fewer points.

#### Collaborative Coding and Frequent Pushing

You have to work at this project as a team. We invite you to use all of the features of gitlab for your project, for example branches, issues, wiki, milestones, etc.

We require you to push very frequently to gitlab. In your commits we want to see how the code evolved. We do NOT want to see the working code suddenly appear - this will make us suspicious.

We also require that all group members do substantial contributions to the project. This also means that one group member should not finish the project all by himself, but distribute the work among all group members!

At the end of Project 3 we will interview all group members and discuss their contributions, to see if we need to modify the score for certain group members.

### Appendix

You may find the following resources helpful:

### Server Configurations

#### Autolab Server (Just for your reference):

• CPUs: 2 * Intel Xeon E5-2690 v4 2.6 GHz, 14 cores (so 28 physical cores and 56 threads (logical cores) in total) Details here
• Memory: 256GB DDR4 2400MHz