The long and winding road to high-performance image processing with MMX/SSE
@article{Conte2000TheLA, title={The long and winding road to high-performance image processing with MMX/SSE}, author={Gianni Conte and Stefano Tommesani and Francesco Zanichelli}, journal={Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception}, year={2000}, pages={302-310}, url={https://api.semanticscholar.org/CorpusID:13180531} }
The complex programming model of MMX/SSE extensions is introduced and how the achievement of an effective performance increase over sequential code is no easy task also due to a poor software support is discussed.
42 Citations
Parallel High-Level Image Processing on a Standard PC
- 2003
Computer Science, Engineering
Higher-level image processing algorithms where image features and recognition is the output of the operations are studied, and Hough transform and Geometric hashing techniques are commonly used algorithms for this purpose using SSE.
SIMD Implementations of Image Processing Kernels: Performance Comparison
- 2012
Computer Science, Engineering
Several examples of multimedia extensions are reviewed and then compared in terms of execution time of programs which run by them and some well-known image processing kernels has been implemented and the reasons of not reaching the maximum speedup will be discussed.
A Review of SIMD Multimedia Extensions and their Usage in Scientific and Engineering Applications
- 2008
Engineering, Computer Science
An overview of SIMD multimedia extensions is given, which reviews recent trends to use multimedia extensions to accelerate many applications such as multimedia, scientific and engineering applications, and argues for further use in other significant computationally intensive applications.
Avenues for High Performance Computation on a PC
- 2004
Computer Science, Engineering
This paper presents combined utilization of two types of parallelism and shows that utilizing the bi-level parallel mechanism, a far superior speed-up can be achieved than those by using only two CPUs.
Multi-CPU Video Processing
- 2007
Computer Science, Engineering
This paper would like to find out how thread-parallel video processing performed on multi-core CPUs can be used to accelerate processing of high definition video.
GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility
- 2024
Computer Science, Engineering
GRAPHIC is the first reported in-memory SIMD architecture that solves the parallelism and irregular data access challenges in applying SIMD to LiM and exploits content-addressable memory (CAM) and row-wise-accessible SRAM.
Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions
- 2005
Computer Science
Results show speedups of 5x in the execution time over a version compiled with the maximum optimizations of the Intel C/C++ compiler, maintaining the compression ratio and the video quality of the original encoder based on the 3D wavelet transform.
Current Research Efforts in Media ISA Development
- 2002
Computer Science, Engineering
This paper gives an overview of research efforts concentrating on developing ISA (Instruction Set Architecture) extensions on general-purpose workstation and desktop processors and shows speed-ups ranging from 1.1 to 2.5 for complete applications compared to current ISA extensions.
Analysis, optimization and execution of general purpose multimedia applications on subword vliw datapaths
- 2003
Computer Science, Engineering
An architecture and an analysis methodology that exploits parallelism across a wide range of multimedia applications by providing better performance and enhanced applicability which in turn enables the required realism in multimedia applications running on general purpose processors.
Reducing 3D wavelet transform execution time through the Streaming SIMD Extensions
- 2003
Computer Science
This paper focuses on reducing the execution time of the video compression algorithms based on the 3D wavelet transform by using the Streaming SIMD Extensions for some of the dimensions of the sequence, and applying loop unrolling and data prefetching to critical parts of the code.
9 References
Evaluating MMX technology using DSP and multimedia applications
- 1998
Computer Science, Engineering
This paper evaluates the X86 architecture's multimedia extension (MMX) instruction set on a set of benchmarks to understand which aspects of native signal processing instruction sets are most useful, the current limitations, and how they can be utilized most efficiently.
Multimedia extensions for general-purpose processors
- 1997
Computer Science, Engineering
This paper gives an overview of the multimedia instructions that have been added to the instruction set architectures of general-purpose microprocessors to accelerate media processing. Examples areโฆ
MMX technology extension to the Intel architecture
- 1996
Computer Science, Engineering
MMX technology extends the Intel architecture to improve the performance of multimedia, communications, and other numeric-intensive applications by introducing data types and instructions to the IA that exploit the parallelism in these applications.
MMX-based DCT and MC algorithms for real-time pure software MPEG decoding
- 1999
Computer Science, Engineering
The convincing results show that: with the addition of the proper SIMD instruction set, the pure software solution for complicated multimedia applications (such as real time MPEG video decoding) becomes feasible.
RETROSPECTIVE: Performance of Image and Video Processing with General-purpose Processors and Media ISA Extensions
- 2023
Computer Science, Engineering
This paper addresses questions about the performance of image and video processing with general-purpose processors and media ISA extensions during a transition in architectures that tried to more aggressively leverage instruction-level parallelism (ILP) through techniques like out-of-order execution, speculative execution, etc.
An X86 microprocessor with multimedia extensions
- 1997
Computer Science
This sixth-generation X86 instruction-set compatible microprocessor implements a set of multimedia extensions that uses two-level branch prediction based on an 8192-entry branch history table, a 16- entry branch target cache and a 16,entry return address stack to identify instruction boundaries.
How Multimedia Workloads Will Change Processor Design
- 1997
Computer Science, Engineering
The authors predict high-performance, general-purpose processors will incorporate more media processing capabilities, eventually bringing about the demise of specialized media processors, except perhaps, in embedded applications.
A common image processing framework for 2D barcode reading
- 1999
Computer Science
An image processing system able to locate, segment and decode the most common 2D symbol used in bar code applications to achieve a unified computational structure.
Theme feature: challenges to combining general-purpose and multimedia processors
- 1997
Computer Science, Engineering