Advanced Computer Architecture Project

Title

Performance Analysis and Optimization of a Multithreaded Application Using Intel VTune

Overview

This project focuses on analyzing and optimizing the performance of a matrix multiplication application using Intel VTune Profiler. It demonstrates a progressive optimization approach, beginning with a naive implementation and evolving into a highly efficient version using OpenMP, tiling, and SIMD vectorization.

The work was conducted as part of the Advanced Computer Architecture course.

Objectives

Profile a matrix multiplication application using Intel VTune.
Identify performance bottlenecks and inefficiencies.
Apply a series of optimizations to enhance performance.
Evaluate and compare results at each stage of optimization.

Tools and Technologies

Intel VTune Profiler: For detailed performance analysis.
C/C++: Core programming language.
POSIX Threads (pthreads) and OpenMP: For multithreading.
SIMD Vectorization: To enhance data-level parallelism.
Linux (Ubuntu): Development and testing environment.

Optimization Stages

The project follows a structured optimization pipeline:

Naive Matrix Multiplication
A standard triple-loop implementation with no optimization. Acts as the baseline for performance comparison.
Tiled Matrix Multiplication
Matrix multiplication with loop tiling (blocking) to improve cache locality and reduce cache misses.
Tiled Matrix Multiplication with Pthreads
Parallelization using POSIX threads, distributing tile-based computations across threads.
Tiled Matrix Multiplication with OpenMP
Migrated to OpenMP for simpler thread management and parallel loop control, improving scalability.
Tiled Matrix Multiplication with Three-Level Tiling
Introduced a three-level (L1, L2, L3 cache-aware) tiling strategy to maximize cache reuse and minimize memory traffic.
Tiled Matrix Multiplication with OpenMP + SIMD Vectorization
Combined OpenMP with compiler-level SIMD intrinsics or vectorization pragmas to exploit both thread-level and data-level parallelism for maximum performance.

Each version was profiled with VTune to observe improvements in:

CPU Utilization
Memory Access Efficiency
Thread Load Balance
Execution Time

Folder Structure

├── src/ # Source code of each version
│ ├── naive/ # Naive matrix multiplication
│ ├── tiled/ # Basic tiling
│ ├── tiled_pthreads/ # Tiling with pthreads
│ ├── tiled_openmp/ # Tiling with OpenMP
│ ├── tiled_3tile/ # Three-level tiled approach
│ └── tiled_simd/ # OpenMP + SIMD optimized
├── reports/ # VTune performance reports
├── screenshots/ # VTune visualizations
├── optimization_notes/ # Notes on strategies and changes
├── README.md # This file
└── Makefile # Build automation

Setup and Usage

Prerequisites

Intel VTune Profiler
GCC or Clang with OpenMP and SIMD support
Make utility
Linux OS (Ubuntu recommended)

Build and Run

cd src/tiled_simd
make
./matrix_mul

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Google Drive Link.txt		Google Drive Link.txt
Performance analysis and optimization of a multithreaded application.pptx		Performance analysis and optimization of a multithreaded application.pptx
Performance_Analysis (1).pdf		Performance_Analysis (1).pdf
README.md		README.md
basic_mm.cpp		basic_mm.cpp
basic_mm.exe		basic_mm.exe
basic_mm1.cpp		basic_mm1.cpp
basic_mm1.exe		basic_mm1.exe
matmul.cpp		matmul.cpp
matmul.exe		matmul.exe
matmul_op.cpp		matmul_op.cpp
matmul_op.exe		matmul_op.exe
omp2.cpp		omp2.cpp
omp2.exe		omp2.exe
project coa.txt		project coa.txt
tiled_matrix_multiplication.cpp		tiled_matrix_multiplication.cpp
tiled_matrix_multiplication.exe		tiled_matrix_multiplication.exe
tiled_mm.cpp		tiled_mm.cpp
tiled_mm.exe		tiled_mm.exe
tiled_mm_3tiles.cpp		tiled_mm_3tiles.cpp
tiled_mm_3tiles.exe		tiled_mm_3tiles.exe
tiled_mm_omp.cpp		tiled_mm_omp.cpp
tiled_mm_omp.exe		tiled_mm_omp.exe
tiled_mm_omp2.exe		tiled_mm_omp2.exe
tiled_mm_vect.cpp		tiled_mm_vect.cpp
tiled_mm_vect.exe		tiled_mm_vect.exe
tiled_mm_vect_02.exe		tiled_mm_vect_02.exe
vtune.pdf		vtune.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Advanced Computer Architecture Project

Title

Overview

Objectives

Tools and Technologies

Optimization Stages

Folder Structure

Setup and Usage

Prerequisites

Build and Run

About

Uh oh!

Releases

Packages

Languages

darcy5/Performance-Analysis-Project-HPP

Folders and files

Latest commit

History

Repository files navigation

Advanced Computer Architecture Project

Title

Overview

Objectives

Tools and Technologies

Optimization Stages

Folder Structure

Setup and Usage

Prerequisites

Build and Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages