Serial to Parallel Comparison Using High Performance Computing

Pranav Bhatt
10 min readDec 7, 2022

--

Functioning showed through model
Model Representation (Basic)

INTRODUCTION TO SERIAL AND PARALLEL PROGRAMMING

BUT FIRSTLY WHAT IS HPC??

The capacity to process data and carry out intricate calculations quickly is known as high performance computing (HPC). For example a laptop or desktop with a 3 GHz processor can complete about 3 billion calculations every second to put things into perspective.
Even though that is far faster than what a human can do, HPC solutions that have the capacity to process quadrillions of computations per second are much faster.

Daily life example: The supercomputer is among the most popular kinds of HPC systems. A supercomputer is made up of tens of thousands of compute nodes that collaborate to finish one or more tasks. Parallel processing is the term for this. The computing power is combined to speed up task completion, much like having thousands of PCs networked together.

Why is HPC important?

It is through data that groundbreaking scientific discoveries are made, game-changing innovations are fueled, and quality of life is improved for billions of people around the globe. HPC is the foundation for scientific, industrial, and societal advancements.

Organizations require blazing-fast, incredibly dependable IT infrastructure to process, store, and analyze vast volumes of data in order to stay one step ahead of the competition. The number and volume of data that enterprises must work with is expanding dramatically as a result of technologies like the Internet of Things (IoT), artificial intelligence (AI), and 3-D imaging.

The capacity to analyse data in real time is essential for many tasks, like streaming a live athletic event, monitoring a storm’s progression, testing new goods, and examining market trend.

How does HPC work?

There are three(3) main components :-

  1. network
  2. compute
  3. storage

Compute servers are networked collectively into a cluster in order to construct a high performance computing architecture. Algorithms and software applications are run concurrently on the cluster’s servers. To capture the output, the cluster is networked to the data storage. These parts work flawlessly together to finish a variety of tasks.

Each component must maintain a constant tempo with the others in order to perform at its peak. The storage component, for instance, needs to be able to feed and ingest data to and from the compute servers as quickly as it is processed. In a similar vein, the networking elements must be capable of supporting the rapid transfer of data between compute servers and data storage. The HPC system as a whole performs worse if one component cannot keep up with the others.

What is an HPC cluster?

A networked group of hundreds or thousands of computing computers makes up an HPC cluster. A node is the name for each server. High performance computing is made possible by the parallel processing that takes place between the nodes in each cluster.

HPC use cases

HPC solutions are utilised for a range of tasks across numerous industries and can be deployed on-premises, at the edge, or in the cloud. Examples comprise:

Research labs: In order to explain the development of the universe, forecast and track storms, identify renewable energy sources, and develop novel materials, scientists use high performance computing (HPC).

Artificial intelligence and machine learning: HPC is employed to prevent credit card fraud, offer self-directed technical assistance, train autonomous vehicles, and advance cancer detection methods.

HPC is utilized to enable quicker, more accurate patient diagnosis as well as to aid in the development of treatments for diseases like cancer and diabetes.

PARFOR VARIABLES

  • Loop Variable: Loop index
  • Sliced Variables: Arrays whose segments are operated on by different iterations of the loop
  • Broadcast Variables: Variables defined before the loop whose value is required inside the loop, but never assigned inside the loop
  • Reduction Variables: Variables that accumulate a value across iterations of the loop, regardless of iteration order
  • Temporary Variables: Variables created inside the loop, and not accessed outside the loop

USING PAR-FOR LOOP

Because multiple workers can compute simultaneously on the same loop, a parfor-loop can perform substantially better than a for-loop. Statements in the loop body are carried out concurrently by the parfor-loop.

With computationally intensive tasks or tasks that require numerous loop iterations, such a Monte Carlo simulation, parfor-loops are more beneficial.

When an iteration in your loop depends on the outcomes of prior iterations, a parfor-loop cannot be used. Every iteration needs to stand alone. This is because, unlike for-loop iterations, which are sequential, parfor-loop iterations offer no assurance of order.

Example (Serial):

Example (Parallel):

SPECIFY THREADS IN PARFOR

The maximum number of threads to employ for a parfor-loop can be specified using the example provided here.

Because you specify the maximum number of threads to use, the generated MEX function executes the loop iterations in parallel on as many cores as available, up to the maximum number that you specify.

Make specify num threads a MEX function. To indicate that input u is a scalar double, use the -args 0 flag. To create a code generation report, use the -report option. the following at the MATLAB command line:

Using parfor to parallelize a loop

Essentially, parfor (opens new window)is advised in two situations: when your loop has many iterations (1e10), or when each iteration takes a very lengthy time (e.g., eig(magic(1e4))). In the second situation, you might want to think about utilising spmd (opens new window). Parfor is slower than a for (opens new window)loop for short ranges or quick iterations due to the overhead required to properly manage all workers rather than just performing the calculation.

Additionally, many functions have implicit multi-threading built in (opens new window), so when calling these methods, a parfor loop is not more efficient than a serial for loop because all cores are already in use. Since parfor has the allocation overhead while being as parallel as the function you’re trying to utilise, it will really be detrimental in this situation.

SERIAL IMPLEMENTATION

Solving or coding for a solution to a problem or concept in different computer languages according to the ease of oneself in which only one operation is carried out at once.

Understanding through an example:-

“A serial in television and radio programming is a program with an ongoing story that develops sequentially, episode by episode.”

The scope of a variable in serial programming is the portion of the program in which the variable may be used.

Example :-
An initialized variable in a C function has “function-wide” scope, meaning that it can only be accessed within the function body. However, a variable declared at the start of a.c file but outside of any function has “file-wide” scope, which means that any function in the file can access the variable.

FLOWCHART REPRESENTATION

PROS AND CONS

  1. Only one bit can be sent at a time using serial communication. Serial communication requires fewer input/output lines because of this.
  2. Additionally, it requires higher resistance and less room for crosstalk. The major benefit of serial communication is that it makes building an entire embedded system very affordable.

PARALLEL IMPLEMENTATION

The method of breaking a problem down into smaller tasks that can be carried out simultaneously using various computing resources. Also resulting in time efficiency i.e. the resulting time in parallel implementation is very less than the time taken in serial implementation.

To put it simply, it is the process of solving a problem by utilizing several resources, in this case, processors.

This type of programming takes a problem, breaks it down into a series of smaller steps, delivers instructions, and processors execute the solutions at the same time.

The same outcomes as concurrent programming are likewise delivered by this kind of programming, just more quickly and effectively. This hardware programming is used by many computers, including laptops and home desktops, to make sure that activities are executed fast in the background.

Lets understand through an example:- (BUBBLE SORT)

FLOWCHART REPRESENTATION OF PARALLEL IMPLEMENTATION

PROS AND CONS OF PARALLEL IMPLEMENTATION

Using this programming over concurrent programming has two key benefits.

  1. When adopting parallel architectures, all processes are accelerated, boosting both efficiency and resources required to produce speedy outcomes.
  2. Due to the fact that parallel computing produces results faster than concurrent programming, it is more cost effective. This is crucial because parallel processes are required to compile enormous volumes of data into manageable data sets or to solve challenging problems

The use of parallel processing has various drawbacks.

  1. Harder to understand (It can be difficult to fully comprehend programming that targets parallel architectures at first.)
  2. In order to properly boost performance, code modifications must be updated for various target architectures. Additionally, consistent outcomes are difficult to estimate since particular architectures can make it difficult to communicate results.
  3. In order to keep the parallel clusters cool, a variety of cooling technologies will be needed. Power consumption is an issue for those implementing numerous processors for different architectures.

Where does it Fit in?

  1. Utilized for everything from engineering and science to shopping and research
  2. From a sociological standpoint, web search engines, applications, and multimedia technologies are where it is most often used.

Concurrent or parallel programming frequently encounters issues that must be resolved, however parallel computing is the way of the future of programming.

It is one of the most reliable programming methods now in use, while having both benefits and drawbacks.

For better references to the text quoted above we will be [performing the serial and parallel implementation on various basic algorithms and as a result we will be analyzing the best results which will also make us a little more clear with the idea of implementation and conditions where we can apply which type of programming.

SOME OF THE DAILY BASIC EXAMPLES WE WILL BE WORKING ON ARE AS FOLLOWS:-

  1. Matrix Multiplication
  2. Linear Search
  3. Array Sum
  4. Prime Numbers
  5. Prime in a Power Set
  6. Word Search in English Dictionary
  7. Image Convolution

Behind the code

We will be implementing the code on GPU of an online compiler like jupyter or google colab and the code will be available in the story as a form of picture. For the code, in serial implementation can be easily found on various sites which can be implemented directly.

For parallel functioning we will use PYPM to create threads and will be seeing the result for number of threads so that we can find or get the optimal number of threads that can be made to find the best result out of the theorem or the code.

BASIC OPENMP FOR IMPLEMENTATION

A few principles are used in OpenMP’s programming model. The first is the threading together of everything. The second approach is the fork-join paradigm, which includes parallel sections where one or more threads may be employed.

We will be using PYMP to shorten our code as it will help us to easily create threads accordingly.

HOW TO INSTALL

For ex (a simple introductory code with pymp):-

STARTING WITH MATRIX MULTIPLICATION

STARTING WITH SERIAL IMPLEMENTATION

OUTCOME

OUTPUT

PARALLEL IMPLEMENTATION

OUTCOME

OUTPUT

There was a clear shift of time from higher to lower side from milli to micro seconds. Which helps in analyzing outcome.

Linear Search

Serial and Parallel Outcome

SERIAL IMPLEMENTATION

OUTPUT

PARALLEL IMPLEMENTATION

OUTPUT

Array Sum

Serial and Parallel Outcome

SERIAL IMPLEMENTATION

OUTPUT

PARALLEL IMPLEMENTATION

OUTPUT

Prime Numbers

Serial and Parallel Outcome

Serial Implementation

OUTPUT

Parallel Implementation

Word Search in English Dictionary

Serial and Parallel Outcome

Serial Implementation

OUTPUT
OUTPUT

Parallel Implementation

OUTPUT

Image Convolution

Serial and Parallel Outcome

Serial Implementation

OUTPUT
OUTPUT

PARALLEL IMPLEMENTATION

OUTPUT

RESULT

The above data clearly shows the time efficiency between the serial implementation and parallel implementation in which only the matrix multiplication types of problem appeared to be time efficient using serial programming

While rest appears to be efficient in parallel programming with the result of change from micro to milli second as output.

--

--

Pranav Bhatt
0 Followers

Motivated Student. Hardworking and resourceful individual commended for first-rate collaboration, organizational and time management abilities.