The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.
In this project, we hope you can use all knowledge about computer architecture that your have learnt in this course to optimize the SHA256 hash function used in the homework 5 CACoin mining program.
Recall that in HW5, you are speeding up the CACoin mining by parallelize the process of finding the approporiate nonce. But some of you may have found that the reason why the mining process is slow is not just that the the process of finding the approporiate nonce, but also the naive, unoptimized SHA256 calculation process. In this project, you are going to use the knowledge you learnt in CA to speed up the naive SHA256 implementation.
Make sure you read through the entire specification before starting the project.
You will be using gitlab to collaborate with your group partner. Autolab will use the files from gitlab. Make sure that you have access to gitlab. In the group CS110_Projects you should have access to your project 3 project. Also, in the group CS110, you should have access to the p3_framework.
git remote add framework https://autolab.sist.shanghaitech.edu.cn/gitlab/cs110/p3_framework.git
git fetch framework
git merge framework/master
This project contains the same set of files as HW5. The SHA256 implementation you are optimizing is in
The SHA256 algorithm is rigorous to be understood by someone without background knowledge in cryptographic. But one can find many obvious optimizations in the implementation we provide, with what you have learnt in Computer Architecture. We are listing some of the possible approaches below:
There are some optimization flags that can be turned on in GCC. The available flags for x86 ISA from GCC is listed here.
The first and the easiest approach is to use multithreading to optimize this algorithm, with either
Part of this algorithm is also a good candidate for SIMD instructions. You can also think of changing the input and output of the hash functions to make use of SIMD instructions while keeping the correctness of SHA-256 evaluation.
Loop unrolling works very well in combination with SIMD instructions for this algorithm, and you should think about it.
Part of this algorithm is also a good candidate for cache blocking. But the optimization is not so obvious.
Your grade will be divided into two parts:
There are something that you need to keep in mind: The SHA256 algorithm is used in almost everywhere, by anybody,
in any time in modern society. So, the algorithm has many very fast implementations available.
You should not submit any existing implemantations that is not written by you. But you can refer to
technical reports and articles for the algorithms and optimizations available and implement your own. Also, you
the built-in instruction set dedicated to accelerate SHA-256, including Intel AES-NI or Intel SHA extensions such as
When your project is done, please submit all the files including the framework to your remote GitLab repo by running the following commands.
$ git commit -a $ git push origin master:master
Autolab will discard all other files except for
Similar to previous projects, upload your
autolab.txt to Autolab to submit your project.
The last time of your submission to the git repo will count towards your submission time - also with respect to slip days. So do not commit to this git after the due date, unless you want to use slip days or are OK with getting fewer points.
You have to work at this project as a team. We invite you to use all of the features of gitlab for your project, for example branches, issues, wiki, milestones, etc.
We require you to push very frequently to gitlab. In your commits we want to see how the code evolved. We do NOT want to see the working code suddenly appear - this will make us suspicious.
We also require that all group members do substantial contributions to the project. This also means that one group member should not finish the project all by himself, but distribute the work among all group members!
At the end of Project 3 we will interview all group members and discuss their contributions, to see if we need to modify the score for certain group members.
You may find the following resources helpful:
CPU: Intel Xeon E5-2650 v3 2.3 GHz, 10 cores (20 threads) Details here