Laboratory of Parallel Architectures for Signal Processing

Mamute Seismic Tools

Mamute is an open-source high-performance computing software for 3D geophysical modeling and inversion, developed at LAPPS/IMD/UFRN in partnership with Equinor.

The software implements wave-equation-based methods for large-scale seismic exploration workflows, providing production-ready implementations of Acoustic and Visco-Acoustic Modeling, Born Modeling, Reverse Time Migration (RTM), Full Waveform Inversion (FWI), and Least-Squares Migration (LSM).

Mamute is written in C++ and leverages OpenMP for shared-memory parallelism, MPI for distributed shot scheduling across cluster nodes, and CUDA for GPU acceleration, enabling scalable execution across modern HPC architectures.

To maximize efficiency, the toolkit incorporates advanced HPC techniques, including runtime autotuning of OpenMP scheduling parameters, wavefield checkpointing for memory-efficient large-scale runs, and application-level fault tolerance for resilient execution on large clusters.

Mamute is configured using TOML files and supports both single- and double-precision floating-point values. A collection of Python utilities is also provided for seismic data preparation, model generation, and visualization.

More information available at lappsufrn.gitlab.io/mamute.

PaScal Suite

The Parallel Scalability Suite (PaScal Suite) is a set of tools for evaluating scalability trends and identifying performance bottlenecks in parallel programs running on shared-memory systems. The suite automates the execution, collection, and comparison of multiple runs of the same application, enabling the analysis of scalability patterns across different configurations of core counts and problem sizes.

Its main goal is to provide an automated, low-overhead environment that helps developers and researchers understand the behavior of parallel applications, facilitating the identification of performance limitations and potential optimization opportunities.

The PaScal Suite consists of two main tools: the PaScal Analyzer, which executes the application and collects performance metrics, and the PaScal Viewer, a visualization tool that interprets the collected data and presents it through graphical representations for scalability analysis.

More information available at pascalsuite.imd.ufrn.br.

DeLIA

Large-scale supercomputers are crucial for complex problem-solving but are prone to failures, underscoring the need for fault-tolerance techniques to mitigate system interruptions. The Dependability Library for Iterative Applications (DeLIA) includes features that enhance fault tolerance in bulk-synchronous programs. It is a flexible solution integrating data preservation at the application level, fault detection, and failover mechanisms. The proposed library streamlines features into applications while offering extensive configurability to adapt to diverse use cases.

DeLIA is a fault-tolerance library at the application level that was developed to be user-friendly. Because of that, we offer an application programming interface (API) that enables developers to incorporate our features into their software efficiently. This API allows users to invoke DeLIA functions, abstract the library implementation, and define the main parameters of DeLIA in a JSON file. User documentation is available at https://lappsufrn.gitlab.io/delia; currently, we have versions in C++, Python, and Julia.

CEVERO

The CEVERO Project aims to develop chip multi-processors for very energy-efficient aerospace missions.

Given the nature of their operation, aerospace and mission-critical system processors are customarily required to be fault-tolerant and energy-efficient. Fault tolerance is required to address the higher probability of radiation strikes when operating far from Earth's surface, which can cause failures due to "single event upsets" (SEUs) and "single event latchups" (SELs). Nowadays, these processors are even more susceptible to such events due to smaller transistor sizes and lower supply voltages in modern semiconductor technology.

The energy efficiency requirement arises from power and cooling constraints. In space, there is no air to help dissipate heat. As a result, the system's heat output must be limited. More power, besides producing more heat, implies higher financial costs with larger solar panels, larger batteries, and consequently more weight and volume to be launched into space.

In this project, we propose to design a flexible, fault-tolerant multi-core processor based on a leading-edge RISC-V platform for use in aerospace and mission-critical applications, such as the Brazilian nanosatellite network for environmental data collection. Multi-core systems have been widely used to achieve superior performance without an exponential increase in power consumption. The proposed design uses the inherent core-level redundancy of these systems to deliver fault tolerance. Additionally, considering that not all parts of a given aerospace application need to be fault-tolerant, e.g., those inherently corrupted by signal noise, the proposed design can switch to an unsafe mode to double or triple its available parallel performance. We aim to enable systems that deliver higher performance and energy efficiency in uncertainty-driven computations while switching to a fault-tolerant mode for critical computations, in response to the application's dynamic demand.

More information available at cevero.github.io.