jeffhammond

Follow

Jeff Hammond jeffhammond

Follow

HPC software @NVIDIA in 🇫🇮. Previously @intel HPC, @argonne-lcf w/ Blue Gene and MPI. PhD in Chemistry from @uchicago for work on @nwchemgit. He/him/hän.

793 followers · 88 following

Achievements

Achievements

Organizations

+ 11 more

Stars

ParCoreLab / ICPE-talk

Keynote talk delivered by Didem Unat at Intl Conference on Performance Engineering 2026

2 Updated May 7, 2026

varunchotalia / cuda-ipc-benchmarks

Cuda 3 Updated May 12, 2026

kamping-site / kamping

KaMPIng: (Near) zero-overhead MPI wrapper for modern C++

C++ 69 7 Updated Apr 27, 2026

gpudirect / libmp

Simple message passing library

Cuda 30 7 Updated Aug 28, 2018

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 762 59 Updated Aug 6, 2025

microsoft / BitNet

Official inference framework for 1-bit LLMs

Python 38,975 3,554 Updated Mar 10, 2026

facebookresearch / param

PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.

Python 155 67 Updated May 6, 2026

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,216 375 Updated Apr 20, 2026

jedbrown / bgq-driver

Blue Gene/Q driver, see https://repo.anl-external.org/repos/bgq-driver/

C++ 5 2 Updated Jan 8, 2014

renatoGarcia / icecream-cpp

🍦 Never use cout/printf to debug again

C++ 741 38 Updated Apr 22, 2026

PlasmaFAIR / fortitude

A Fortran linter, written in Rust and installable with Python.

Rust 196 25 Updated May 13, 2026

cea-hpc / pcvs-benchmarks

Parallel Computing -- Validation Suite: Validation engine for Exascale project benchmarks

C 16 2 Updated Mar 26, 2026

ivan-pi / blis-fortran

Fortran bindings generated using Coccinelle

C++ 2 Updated Feb 2, 2025

federico-busato / Modern-CPP-Programming

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

HTML 15,585 1,096 Updated Apr 19, 2026

rigtorp / isatomic

Test if AVX vector loads and stores are atomic

C++ 35 5 Updated Jul 9, 2020

TAPPorg / tensor-interfaces

A place to store information for the tensor discussions and possible specifications.

C 24 6 Updated Jul 2, 2025

jjgoings / McMurchie-Davidson

do a simple closed shell Hartree-Fock using McMurchie-Davidson to compute integrals

Python 89 18 Updated Jun 8, 2024

eugnsp / library

Data structures, algorithms, and C++ reference library

252 35 Updated Apr 18, 2026

rbitr / llm.f90

LLM inference in Fortran

Fortran 63 9 Updated May 30, 2024

jjgoings / pfapack

Forked from basnijholt/pfapack

Efficient numerical computation of the Pfaffian for dense and banded skew-symmetric matrices

Python 2 1 Updated Mar 18, 2026

giordano / julia-on-gh200

Julia 8 Updated Dec 9, 2024

BerkeleyLab / fortran-compiler-test-suite

A framework and suite of cases for testing a Fortran compiler

Python 11 3 Updated Nov 14, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 1,181 183 Updated Sep 2, 2025

kpamnany / MultithreadingBenchmarks.jl

C 4 Updated Aug 24, 2023

JuliaGPU / GemmKernels.jl

Flexible and performant GEMM kernels in Julia

Julia 84 12 Updated May 14, 2026

leeping / geomeTRIC

Geometry optimization code that includes the TRIC coordinate system

Python 211 77 Updated Apr 20, 2026

psi-rking / optking

optking: A molecular geometry optimization program

Python 27 14 Updated Apr 17, 2026

jhrmnn / pyberny

Molecular structure optimizer

Python 130 26 Updated Dec 17, 2022

microsoft / accelerated-dft

Repository to host supporting information and code samples for Accelerated DFT

Jupyter Notebook 38 2 Updated Apr 29, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 29,901 3,589 Updated Jun 26, 2025