Project 3 - Computer Architecture I - ShanghaiTech University

Project 3 - CACoin Mining Hash Function Optimization

Computer Architecture I | ShanghaiTech University

IMPORTANT INFO - PLEASE READ

The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.

In this project, we hope you can use all knowledge about computer architecture that your have learnt in this course to optimize the SHA256 hash function used in the homework 5 CACoin mining program.

Recall that in HW5, you are speeding up the CACoin mining by parallelize the process of finding the approporiate nonce. But some of you may have found that the reason why the mining process is slow is not just that the the process of finding the approporiate nonce, but also the naive, unoptimized SHA256 calculation process. In this project, you are going to use the knowledge you learnt in CA to speed up the naive SHA256 implementation.

Getting started

Make sure you read through the entire specification before starting the project.

You will be using gitlab to collaborate with your group partner. Autolab will use the files from gitlab. Make sure that you have access to gitlab. In the group CS110_Projects you should have access to your project 3 project. Also, in the group CS110, you should have access to the p3_framework.

Obtain your files

Clone your p3 repository from GitLab.
In the repository add a remote repo that contains the framework files: git remote add framework https://autolab.sist.shanghaitech.edu.cn/gitlab/cs110/p3_framework.git
Go and fetch the files: git fetch framework
Now merge those files with your master branch: git merge framework/master
The rest of the git commands work as usual.

Files

This project contains the same set of files as HW5. The SHA256 implementation you are optimizing is in hash_functions/sha256.c.

Optimization Techniques

The SHA256 algorithm is rigorous to be understood by someone without background knowledge in cryptographic. But one can find many obvious optimizations in the implementation we provide, with what you have learnt in Computer Architecture. We are listing some of the possible approaches below:

Compiler

There are some optimization flags that can be turned on in GCC. The available flags for x86 ISA from GCC is listed here.

Multithreading

The first and the easiest approach is to use multithreading to optimize this algorithm, with either pthread or openmp.

SIMD instructions

Part of this algorithm is also a good candidate for SIMD instructions. You can also think of changing the input and output of the hash functions to make use of SIMD instructions while keeping the correctness of SHA-256 evaluation.

Loop unrolling

Loop unrolling works very well in combination with SIMD instructions for this algorithm, and you should think about it.

Cache Blocking

Part of this algorithm is also a good candidate for cache blocking. But the optimization is not so obvious.

Grading Policy

Your grade will be divided into two parts:

You are allowed to re-use what you have implemented in HW5 to accelerate the mining process.
We will first run your code on small test cases on Autolab. If you program produces the correct result, you receive 60% points. Memory leak check with valgrind is also included in this stage.
After the due, we will test your code on large test cases. Your grade on this part depends on the execution time of your code. We will mine a loooooot of CACoins there, and take the weighted average of mining time regarding the difficulty parameters. If you have earned the 60% in step 1, the rest 40% will be given based on the performance of your algorithm on the strong server.
If your code should not crash on either stage, or you will receive no point in that stage.
Your submission must contain meaningful use of OpenMP and Intel SIMD intrinsics. Otherwise, you will get 0 point from both stages. This check will be done manually after the deadline so there will be no feedback on this from the auto-grader.

There are something that you need to keep in mind: The SHA256 algorithm is used in almost everywhere, by anybody, in any time in modern society. So, the algorithm has many very fast implementations available. You should not submit any existing implemantations that is not written by you. But you can refer to some technical reports and articles for the algorithms and optimizations available and implement your own. Also, you CANNOT use the built-in instruction set dedicated to accelerate SHA-256, including Intel AES-NI or Intel SHA extensions such as SHA256RNDS2, SHA256MSG1, SHA256MSG2, etc.

Submission

When your project is done, please submit all the files including the framework to your remote GitLab repo by running the following commands.

$ git commit -a
$ git push origin master:master

Autolab will discard all other files except for blockchain.c, hash_functions.c, hash_functions/sha256.c, hash_functions/sha256.h and Makefile.

How to Autolab

Similar to previous projects, upload your autolab.txt to Autolab to submit your project.

Submission Time Announcement

The last time of your submission to the git repo will count towards your submission time - also with respect to slip days. So do not commit to this git after the due date, unless you want to use slip days or are OK with getting fewer points.

Collaborative Coding and Frequent Pushing

You have to work at this project as a team. We invite you to use all of the features of gitlab for your project, for example branches, issues, wiki, milestones, etc.

We require you to push very frequently to gitlab. In your commits we want to see how the code evolved. We do NOT want to see the working code suddenly appear - this will make us suspicious.

We also require that all group members do substantial contributions to the project. This also means that one group member should not finish the project all by himself, but distribute the work among all group members!

At the end of Project 3 we will interview all group members and discuss their contributions, to see if we need to modify the score for certain group members.

Appendix

You may find the following resources helpful:

The SHA256 page from NIST: https://csrc.nist.gov/projects/hash-functions

The Intel Whitepaper for Fast SHA256 implementation: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf

Analysis of SIMD Applicability to SHA Algorithms: https://software.intel.com/content/dam/develop/external/us/en/documents/aciicmez-166988.pdf

Server Configurations

Autolab Server (Just for your reference):

CPUs: 2 * Intel Xeon E5-2690 v4 2.6 GHz, 14 cores (so 28 physical cores and 56 threads (logical cores) in total) Details here
Memory: 256GB DDR4 2400MHz
4 threads are allowed for each grading job

Star Lab Server (Used for final speed test):

CPU: Intel Xeon E5-2650 v3 2.3 GHz, 10 cores (20 threads) Details here
- L1 cache 640KB per core
- L2 cache 2560KB per core
- L3 cache 25600KB

Memory: 32GB