BibSLEIGH
BibSLEIGH corpus
BibSLEIGH tags
BibSLEIGH bundles
BibSLEIGH people
EDIT!
CC-BY
Open Knowledge
XHTML 1.0 W3C Rec
CSS 2.1 W3C CanRec
email twitter
gpu
Google gpu

Tag #gpu

172 papers:

ASPLOSASPLOS-2020-HuangJ0 #learning #memory management #named
SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping (CCH, GJ, JL0), pp. 1341–1355.
ASPLOSASPLOS-2020-PengSD0MXYQ #learning #memory management #named
Capuchin: Tensor-based GPU Memory Management for Deep Learning (XP, XS, HD, HJ0, WM, QX, FY, XQ), pp. 891–905.
CGOCGO-2020-ShobakiKM #approach #combinator #optimisation #using
Optimizing occupancy and ILP on the GPU using a combinatorial approach (GS, AK, SM), pp. 133–144.
IFM-2019-OsamaW #named #satisfiability
SIGmA: GPU Accelerated Simplification of SAT Formulas (MO, AW), pp. 514–522.
KDDKDD-2019-FuZCR #named #optimisation #performance #robust #visualisation
AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization (CF, YZ, DC, XR), pp. 176–186.
ASEASE-2019-Laguna #detection #exception #float #named
FPChecker: Detecting Floating-Point Exceptions in GPU Applications (IL), pp. 1126–1129.
ASPLOSASPLOS-2019-PhothilimthanaE #data flow #kernel #synthesis
Swizzle Inventor: Data Movement Synthesis for GPU Kernels (PMP, ASE, AW0, AJ, BH, HB, SJK, VG, ET, RB), pp. 65–78.
CGOCGO-2019-LiL0 #optimisation #runtime
Accelerating GPU Computing at Runtime with Binary Optimization (GL, LL, XF0), pp. 276–277.
FASEFASE-2019-PengR #effectiveness #kernel #named
CLTestCheck: Measuring Test Effectiveness for GPU Kernels (CP, AR), pp. 315–331.
ICFP-2018-ElsmanHAO #functional #higher-order #in the large #programming
Static interpretation of higher-order modules in Futhark: functional GPU programming in the large (ME, TH, DA, CEO), p. 30.
PLDIPLDI-2018-HongSKRKPRS #analysis #kernel #optimisation #using
GPU code optimization using abstract kernel emulation and sensitivity analysis (CH, ASR, JK, PSR, SK, LNP, FR, PS), pp. 736–751.
SASSAS-2018-AlurDS #independence #source code
Block-Size Independence for GPU Programs (RA, JD, NS), pp. 107–126.
ASPLOSASPLOS-2018-Ausavarungnirun #concurrent #memory management #multi #named
MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency (RA, VM, JL, SG, JG, AJ, CJR, OM), pp. 503–518.
ASPLOSASPLOS-2018-YaoMLSC #named #web
Sugar: Secure GPU Acceleration in Web Browsers (ZY, ZM, YL, AAS, AC), pp. 519–534.
CASECASE-2018-InuiEMZ #geometry #process #simulation #using
Geometric Simulation of Infeed Grinding Process of Silicon Wafer Using GPU (MI, YE, TM, LZ), pp. 1519–1524.
ESEC-FSEESEC-FSE-2017-SorensenED #algorithm #kernel #multi
Cooperative kernels: GPU multitasking for blocking algorithms (TS0, HE, AFD), pp. 431–441.
CASECASE-2017-InuiKU #parallel #performance #using
Fast extraction of cutter engagement features by using the parallel processing function of a GPU (MI, MK, NU), pp. 668–673.
CCCC-2017-ShirakoHS #parallel #using
Optimized two-level parallelization for GPU accelerators using the polyhedral model (JS, AH, VS), pp. 22–33.
CGOCGO-2017-GongCZUK #execution #hardware #kernel #named #scheduling
TwinKernels: an execution model to improve GPU hardware scheduling at compile time (XG, ZC, AKZ, RU, DRK), pp. 39–49.
CGOCGO-2017-SteuwerRD #code generation #functional #information retrieval #named
Lift: a functional data-parallel IR for high-performance GPU code generation (MS, TR, CD), pp. 74–85.
CAVCAV-2017-AlurDLS #detection #named #source code
GPUDrano: Detecting Uncoalesced Accesses in GPU Programs (RA, JD, OSNL, NS), pp. 507–525.
FMFM-2016-WijsNB #model checking
GPUexplore 2.0: Unleashing GPU Explicit-State Model Checking (AW, TN, DB), pp. 694–701.
PADLPADL-2016-DovierFPV #implementation
A GPU Implementation of the ASP Computation (AD, AF, EP, FV), pp. 30–47.
PLDIPLDI-2016-SorensenD #fault #memory management
Exposing errors related to weak memory in GPU applications (TS0, AFD), pp. 100–113.
CCCC-2016-MajetiMBS #architecture #automation #cpu #generative #kernel #layout
Automatic data layout generation and kernel mapping for CPU+GPU architectures (DM, KSM, RB, VS), pp. 240–250.
CGOCGO-2016-BarikFLHS #approach #black box #cpu #energy #scheduling
A black-box approach to energy-aware scheduling on integrated CPU-GPU systems (RB, NF, BTL, CH, TS), pp. 70–81.
VLDBVLDB-2015-BoghCA #parallel
Work-Efficient Parallel Skyline Computation for the GPU (KSB, SC, IA), pp. 962–973.
FDGFDG-2015-MarkBMT #3d #game studies #generative
Procedural Generation of 3D Caves for Games on the GPU (BM, TB, TM, JT).
CIKMCIKM-2015-JoKB #analysis #matrix #multi #network #performance #scalability #social
Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis (YYJ, SWK, DHB), pp. 1261–1270.
ICMLICML-2015-TristanTS #estimation #performance
Efficient Training of LDA on a GPU by Mean-for-Mode Estimation (JBT, JT, GLSJ), pp. 59–68.
PLDIPLDI-2015-SharmaBA #source code #verification
Verification of producer-consumer synchronization in GPU programs (RS, MB, AA), pp. 88–98.
ICSEICSE-v2-2015-Salgado #behaviour #cpu #interactive #kernel #profiling
Profiling Kernels Behavior to Improve CPU / GPU Interactions (RS), pp. 754–756.
SACSAC-2015-JoselliJC #animation #data type #named #proximity
NGrid: a proximity data structure for fluids animation with GPU computing (MJ, JRdSJ, EC), pp. 1303–1308.
SACSAC-2015-MartinCBGP #algorithm
OpenACC-based GPU acceleration of an optical flow algorithm (NM, JC, GB, CG, MP), pp. 96–98.
GPCEGPCE-2015-KolesnichenkoPN #contract #programming
Contract-based general-purpose GPU programming (AK, CMP, SN, BM), pp. 75–84.
ASPLOSASPLOS-2015-AlglaveBDGKPSW #behaviour #concurrent #programming
GPU Concurrency: Weak Behaviours and Programming Assumptions (JA, MB, AFD, GG, JK, DP, TS, JW), pp. 577–591.
ASPLOSASPLOS-2015-ParkPM #collaboration #multi #named
Chimera: Collaborative Preemption for Multitasking on a Shared GPU (JJKP, YP, SAM), pp. 593–606.
CGOCGO-2015-LiYLZ #automation #memory management
Automatic data placement into GPU on-chip memory resources (CL, YY, ZL, HZ), pp. 23–33.
DACDAC-2015-HanF #analysis #approach #cpu #graph #scalability
Transient-simulation guided graph sparsification approach to scalable harmonic balance (HB) analysis of post-layout RF circuits leveraging heterogeneous CPU-GPU computing systems (LH, ZF), p. 6.
DACDAC-2015-KadjoAKG #approach #cpu #energy #mobile #performance #platform
A control-theoretic approach for energy efficient CPU-GPU subsystem in mobile platforms (DK, RA, MK, PVG), p. 6.
DATEDATE-2015-GerumBR #performance #simulation
Source level performance simulation of GPU cores (CG, OB, WR), pp. 217–222.
DATEDATE-2015-NguyenASS #platform #simulation
Accelerating complex brain-model simulations on GPU platforms (HADN, ZAA, GS, CS), pp. 974–979.
DATEDATE-2015-ParkAHYL #big data #energy #low cost #memory management #performance
Memory fast-forward: a low cost special function unit to enhance energy efficiency in GPU for big data processing (EP, JA, SH, SY, SL), pp. 1341–1346.
DATEDATE-2015-WangLWY
Eliminating intra-warp conflict misses in GPU (BW, ZL, XW, WY), pp. 689–694.
HPCAHPCA-2015-AroraMPJT #behaviour #benchmark #comprehension #cpu #metric #power management
Understanding idle behavior and power gating mechanisms in the context of modern benchmarks on CPU-GPU Integrated systems (MA, SM, IP, NJ, DMT), pp. 366–377.
HPCAHPCA-2015-LengZR #architecture
GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures (JL, YZ, VJR), pp. 161–173.
HPCAHPCA-2015-SethiaJM #memory management #named
Mascar: Speeding up GPU warps by reducing memory pitstops (AS, DAJ, SAM), pp. 174–185.
HPCAHPCA-2015-TiwariGRMRVOLDN #comprehension #design #fault #scalability
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation (DT, SG, JHR, DM, PR, SSV, DAGdO, DL, ND, POAN, LC, ASB), pp. 331–342.
HPDCHPDC-2015-WahibM #automation #kernel #scalability
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications (MW, NM), pp. 259–270.
HPDCHPDC-2015-XiaoCHZ #cpu #monte carlo
Monte Carlo Based Ray Tracing in CPU-GPU Heterogeneous Systems and Applications in Radiation Therapy (KX, DZC, XSH, BZ), pp. 247–258.
PDPPDP-2015-BlaskiewiczZBD #network #parallel
An Application of GPU Parallel Computing to Power Flow Calculation in HVDC Networks (PB, MZ, PB, PD), pp. 635–641.
PDPPDP-2015-KonstantinidisC #bound #kernel #memory management #performance
A Practical Performance Model for Compute and Memory Bound GPU Kernels (EK, YC), pp. 651–658.
PDPPDP-2015-NakanoI #algorithm #implementation #memory management #parallel
Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation (KN, YI), pp. 626–634.
PDPPDP-2015-SoroushniaDPP #algorithm #implementation #parallel #pattern matching
Parallel Implementation of Fuzzified Pattern Matching Algorithm on GPU (SS, MD, TP, JP), pp. 341–344.
PDPPDP-2015-YounessIMO #implementation #optimisation #performance #problem #satisfiability
An Efficient Implementation of Ant Colony Optimization on GPU for the Satisfiability Problem (HAY, AI, MM, MO), pp. 230–235.
PPoPPPPoPP-2015-PiaoKOLKKL #adaptation #cpu #framework #javascript #named
JAWS: a JavaScript framework for adaptive CPU-GPU work sharing (XP, CK, YO, HL, JK, HK, JWL), pp. 251–252.
PPoPPPPoPP-2015-ShiLDHJLWLZ #graph #hybrid #optimisation
Optimization of asynchronous graph processing on GPU with hybrid coloring model (XS, JL, SD, BH, HJ, LL, ZW, XL, JZ), pp. 271–272.
PPoPPPPoPP-2015-WangDPWRO #graph #library #named
Gunrock: a high-performance graph processing library on the GPU (YW, AAD, YP, YW, AR, JDO), pp. 265–266.
TACASTACAS-2015-Wijs #branch #similarity
GPU Accelerated Strong and Branching Bisimilarity Checking (AW), pp. 368–383.
VLDBVLDB-2015-HeZH14 #architecture #cpu #query
In-Cache Query Co-Processing on Coupled CPU-GPU Architectures (JH, SZ, BH), pp. 329–340.
ICEISICEIS-v1-2014-PenaAMFF #algorithm #parallel #using
An Improved Parallel Algorithm Using GPU for Siting Observers on Terrain (GCP, MVAA, SVGM, WRF, CRF), pp. 367–375.
SEKESEKE-2014-JuniorCMS #data analysis #repository
Exploratory Data Analysis of Software Repositories via GPU Processing (JRDSJ, EC, LM, AS), pp. 495–500.
OOPSLAOOPSLA-2014-HolkNSL #data type #memory management #programming language
Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host (EH, RN, JGS, AL), pp. 141–155.
CGOCGO-2014-XuWGLGQ #architecture #memory management #transaction
Software Transactional Memory for GPU Architectures (YX, RW, NG, TL, LG, DQ), p. 1.
DACDAC-2014-KoKYKH #cpu #platform #simulation
Hardware-in-the-loop Simulation for CPU/GPU Heterogeneous Platforms (YK, TK, YY, MK, SH), p. 6.
DACDAC-2014-PathaniaJPM #3d #cpu #game studies #mobile #power management
Integrated CPU-GPU Power Management for 3D Mobile Games (AP, QJ, AP, TM), p. 6.
DATEDATE-2014-LeeL #3d #on the #reduction
On GPU bus power reduction with 3D IC technologies (YJL, SKL), pp. 1–6.
HPCAHPCA-2014-ElTantawyMOA #architecture #control flow #multi #performance #scalability
A scalable multi-path microarchitecture for efficient GPU control flow (AE, JWM, MO, TMA), pp. 248–259.
HPCAHPCA-2014-KimLJK #architecture #memory management #named #using
GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management (YK, JL, JEJ, JK), pp. 546–557.
HPCAHPCA-2014-NugterenBCB #distance #modelling #reuse
A detailed GPU cache model based on reuse distance theory (CN, GJvdB, HC, HEB), pp. 37–48.
HPCAHPCA-2014-PowerHW
Supporting x86-64 address translation for 100s of GPU lanes (JP, MDH, DAW), pp. 568–578.
OSDIOSDI-2014-KimHZHWWS #abstraction #named #network #source code
GPUnet: Networking Abstractions for GPU Programs (SK, SH, XZ, YH, AW, EW, MS), pp. 201–216.
PDPPDP-2014-ArbelaezC #constraints #implementation #parallel
A GPU Implementation of Parallel Constraint-Based Local Search (AA, PC), pp. 648–655.
PDPPDP-2014-BelliniBCMN #analysis #simulation
Simulation and Analysis of the Blood Coagulation Cascade Accelerated on GPU (MB, DB, PC, GM, MSN), pp. 590–593.
PDPPDP-2014-BoobGP #automation #cpu #parallel #performance
Automated Instantiation of Heterogeneous Fast Flow CPU/GPU Parallel Pattern Applications in Clouds (SB, HGV, AMP), pp. 162–169.
PDPPDP-2014-CamargosSGM #linear
Iterative Solution on GPU of Linear Systems Arising from the A-V Edge-FEA of Time-Harmonic Electromagnetic Phenomena (AFPC, VCS, JMG, GM), pp. 365–371.
PDPPDP-2014-DefourM #fuzzy #library #named
FuzzyGPU: A Fuzzy Arithmetic Library for GPU (DD, MM), pp. 624–631.
PDPPDP-2014-GargH #cpu #library #multi
A Portable and High-Performance General Matrix-Multiply (GEMM) Library for GPUs and Single-Chip CPU/GPU Systems (RG, LJH), pp. 672–680.
PDPPDP-2014-IshigamiKN #algorithm #implementation
GPU Implementation of Inverse Iteration Algorithm for Computing Eigenvectors (HI, KK, YN), pp. 664–671.
PDPPDP-2014-SanchezAGCMC #approach #named
FRODRUG: A Virtual Screening GPU Accelerated Approach for Drug Discovery (SGS, ERA, JIG, PC, ASM, RC), pp. 594–600.
PDPPDP-2014-Topa #automaton #memory management #performance
Cellular Automata Model Tuned for Efficient Computation on GPU with Global Memory Cache (PT), pp. 380–383.
CAVCAV-2014-BardsleyBCCDDKLQ #kernel #verification
Engineering a Static Verification Tool for GPU Kernels (EB, AB, NC, PC, PD, AFD, JK, DL, SQ), pp. 226–242.
VLDBVLDB-2013-Bress #hybrid #performance #query #why
Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS (SB), pp. 1398–1403.
VLDBVLDB-2013-HeLH #architecture #cpu
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture (JH, ML, BH), pp. 889–900.
VLDBVLDB-2013-YuanL0 #query
The Yin and Yang of Processing Data Warehousing Queries on GPU Devices (YY, RL, XZ), pp. 817–828.
VLDBVLDB-2013-ZhangHHL #architecture #cpu #named #parallel #performance #query #towards
OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures (SZ, JH, BH, ML), pp. 1374–1377.
CSMRCSMR-2013-ScannielloECG #using
Using the GPU to Green an Intensive and Massive Computation System (GS, UE, GC, CG), pp. 384–387.
ICFPICFP-2013-McDonellCKL #functional #optimisation #source code
Optimising purely functional GPU programs (TLM, MMTC, GK, BL), pp. 49–60.
OOPSLAOOPSLA-2013-ChongDKKQ #abstraction #analysis #invariant #kernel
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels (NC, AFD, PHJK, JK, SQ), pp. 605–622.
ASPLOSASPLOS-2013-JooybarFODA #architecture #named
GPUDet: a deterministic GPU architecture (HJ, WWLF, MO, JD, TMA), pp. 1–12.
DACDAC-2013-HanZF #named #parallel #simulation
TinySPICE: a parallel SPICE simulator on GPU for massively repeated small circuit simulations (LH, XZ, ZF), p. 8.
DATEDATE-2013-ZakharenkoAM #cpu #performance #using
Characterizing the performance benefits of fused CPU/GPU systems using FusionSim (VZ, TMA, AM), pp. 685–688.
HPCAHPCA-2013-LustigM #cpu #fine-grained #latency
Reducing GPU offload latency via fine-grained CPU-GPU synchronization (DL, MM), pp. 354–365.
HPCAHPCA-2013-RhuE #control flow #execution #performance
The dual-path execution model for efficient GPU control flow (MR, ME), pp. 591–602.
HPCAHPCA-2013-SinghSFOA #architecture
Cache coherence for GPU architectures (IS, AS, WWLF, MO, TMA), pp. 578–590.
HPDCHPDC-2013-YuZQYWG #game studies #named #scheduling
VGRIS: virtualized GPU resource isolation and scheduling in cloud gaming (MY, CZ, ZQ, JY, YW, HG), pp. 203–214.
PDPPDP-2013-BukataS #algorithm #design #problem #scheduling
A GPU Algorithm Design for Resource Constrained Project Scheduling Problem (LB, PS), pp. 367–374.
PDPPDP-2013-GuptaGV #3d #linear #simulation #using
3D Bubbly Flow Simulation on the GPU — Iterative Solution of a Linear System Using Sub-domain and Level-Set Deflation (RG, MBvG, CV), pp. 359–366.
PDPPDP-2013-LavilleMLPM #multi #simulation #using
Using GPU for Multi-Agent Soil Simulation (GL, KM, CL, LP, NM), pp. 392–399.
PPoPPPPoPP-2013-DeoK #array #parallel
Parallel suffix array and least common prefix for the GPU (MD, SK), pp. 197–206.
PPoPPPPoPP-2013-WuZZJS #algorithm #analysis #complexity #design #memory management
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU (BW, ZZ, EZZ, YJ, XS), pp. 57–68.
PPoPPPPoPP-2013-YangXFGLXLSYZ #algorithm #cpu #simulation
A peta-scalable CPU-GPU algorithm for global atmospheric simulations (CY, WX, HF, LG, LL, YX, YL, JS, GY, WZ), pp. 1–12.
ESOPESOP-2013-CollingbourneDKQ #analysis #kernel #semantics #verification
Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels (PC, AFD, JK, SQ), pp. 270–289.
VLDBVLDB-2012-WangHLWZS #cpu #hybrid #image
Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems (KW, YH, RL, FW, XZ, JHS), pp. 1543–1554.
ICFPICFP-2012-BergstromR
Nested data-parallelism on the gpu (LB, JHR), pp. 247–258.
GRAPHITEGRAPHITE-2012-Cormie-Bowins #comparison #implementation #reachability
A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities (ECB), pp. 20–34.
CIKMCIKM-2012-KozawaAK #database #mining #nondeterminism #probability
GPU acceleration of probabilistic frequent itemset mining from uncertain databases (YK, TA, HK), pp. 892–901.
CIKMCIKM-2012-MasadaT #topic
Extraction of topic evolutions from references in scientific articles and its GPU acceleration (TM, AT), pp. 1522–1526.
OOPSLAOOPSLA-2012-BettsCDQT #kernel #named #verification
GPUVerify: a verifier for GPU kernels (AB, NC, AFD, SQ, PT), pp. 113–132.
PLDIPLDI-2012-LeungGAGJL #kernel #verification
Verifying GPU kernels by test amplification (AL, MG, YA, RG, RJ, SL), pp. 383–394.
SACSAC-2012-FazackerleyML #database
GPU accelerated AES-CBC for database applications (SF, SMM, RL), pp. 873–878.
SACSAC-2012-JiXWLTY #sequence
High-throughput antibody sequence alignment based on GPU computing (GJ, ZX, XW, SL, MT, JY), pp. 1417–1418.
CCCC-2012-UnkuleSQ #automation #kernel #locality #thread
Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality (SU, CS, AQ), pp. 21–40.
CGOCGO-2012-JablinJPLA #architecture #cpu
Dynamically managed data for CPU-GPU architectures (TBJ, JAJ, PP, FL, DIA), pp. 165–174.
CGOCGO-2012-ZhangM #3d #clustering
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters (YZ, FM), pp. 155–164.
DACDAC-2012-JeongESP #cpu #memory management
A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC (MKJ, ME, CS, NCP), pp. 850–855.
DACDAC-2012-KimLCKWYL #cpu #hybrid #in memory #memory management
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU (DK, SL, JC, DK, DHW, SY, SL), pp. 888–896.
DACDAC-2012-RenCWZY #parallel #simulation
Sparse LU factorization for parallel circuit simulation on GPU (LR, XC, YW, CZ, HY), pp. 1125–1130.
DACDAC-2012-VincoCBF #architecture #named
SAGA: SystemC acceleration on GPU architectures (SV, DC, VB, FF), pp. 115–120.
HPCAHPCA-2012-LeeK #architecture #cpu #named #policy
TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture (JL, HK), pp. 91–102.
HPCAHPCA-2012-YangXMZ #architecture #cpu
CPU-assisted GPGPU on fused CPU-GPU architectures (YY, PX, MM, HZ), pp. 103–114.
PDPPDP-2012-BoukedjarLB #bound #branch #cpu #parallel
Parallel Branch and Bound on a CPU-GPU System (AB, MEL, DEB), pp. 392–398.
PDPPDP-2012-DilchM #algorithm #analysis #architecture #novel #optimisation #performance
Optimization Techniques and Performance Analyses of Two Life Science Algorithms for Novel GPU Architectures (DD, EM), pp. 376–383.
PDPPDP-2012-FortinGG #towards
Towards Solving the Table Maker’s Dilemma on GPU (PF, MG, SG), pp. 407–415.
PDPPDP-2012-RungsawangM #clustering #performance #rank
Fast PageRank Computation on a GPU Cluster (AR, BM), pp. 450–456.
PDPPDP-2012-SpigaG #cpu #hybrid #library #migration #named #quantum
phiGEMM: A CPU-GPU Library for Porting Quantum ESPRESSO on Hybrid Systems (FS, IG), pp. 368–375.
PPoPPPPoPP-2012-KimSLNJL #clustering #cpu #programming
OpenCL as a unified programming model for heterogeneous CPU/GPU clusters (JK, SS, JL, JN, GJ, JL), pp. 299–300.
PPoPPPPoPP-2012-LiuAHLSZWT #implementation #named
FlexBFS: a parallelism-aware implementation of breadth-first search on GPU (GL, HA, WH, XL, TS, WZ, XW, XT), pp. 279–280.
PPoPPPPoPP-2012-Mendez-LojoBP #analysis #implementation #points-to
A GPU implementation of inclusion-based points-to analysis (MML, MB, KP), pp. 107–116.
PPoPPPPoPP-2012-MerrillGG #graph #scalability #traversal
Scalable GPU graph traversal (DM, MG, ASG), pp. 117–128.
PPoPPPPoPP-2012-TaoBB #development #kernel #scalability #using
Using GPU’s to accelerate stencil-based computation kernels for the development of large scale scientific applications on heterogeneous systems (JT, MB, SRB), pp. 287–288.
JCDLJCDL-2011-LiZYWWW #recommendation #social #using
A social network-aware top-N recommender system using GPU (RL, YZ, HY, XW, JW, BW), pp. 287–296.
CSCWCSCW-2011-AspinR #3d #approach #multi
A GPU based, projective multi-texturing approach to reconstructing the 3D human form for application in tele-presence (RAA, DJR), pp. 105–112.
CIKMCIKM-2011-KrulisLBSS #architecture #distance #manycore #polynomial
Processing the signature quadratic form distance on many-core GPU architectures (MK, JL, CB, TS, TS), pp. 2373–2376.
PLDIPLDI-2011-JablinPJJBA #automation #communication #cpu #optimisation
Automatic CPU-GPU communication management and optimization (TBJ, PP, JAJ, NPJ, SRB, DIA), pp. 142–151.
ASPLOSASPLOS-2011-ZhangJGTS #on the fly
On-the-fly elimination of dynamic irregularities for GPU computing (EZZ, YJ, ZG, KT, XS), pp. 369–380.
DACDAC-2011-ZhaoF #3d #parallel #performance #platform
Fast multipole method on GPU: tackling 3-D capacitance extraction on massively parallel SIMD platforms (XZ, ZF), pp. 558–563.
DACDAC-2011-ZhuDC #architecture #cpu #named
Hermes: an integrated CPU/GPU microarchitecture for IP routing (YZ, YD, YC), pp. 1044–1049.
DATEDATE-2011-KangD #classification #metaprogramming #scalability
Scalable packet classification via GPU metaprogramming (KK, YSD), pp. 871–874.
DATEDATE-2011-Wang #coordination #kernel #power management
Coordinate strip-mining and kernel fusion to lower power consumption on GPU (GW), pp. 1218–1219.
HPCAHPCA-2011-ZhangO #analysis #architecture #performance
A quantitative performance analysis model for GPU architectures (YZ, JDO), pp. 382–393.
HPDCHPDC-2011-LiLTCZ #3d #cpu #experience #re-engineering
Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system (LL, XL, GT, MC, PZ), pp. 195–204.
HPDCHPDC-2011-RaviBAC #framework #runtime
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework (VTR, MB, GA, STC), pp. 217–228.
ISMMISMM-2011-VeldemaP
Iterative data-parallel mark&sweep on a GPU (RV, MP), pp. 1–10.
PDPPDP-2011-BarlasHJ #approach #case study #cpu #design #encryption #parallel
An Analytical Approach to the Design of Parallel Block Cipher Encryption/Decryption: A CPU/GPU Case Study (GDB, AH, YAJ), pp. 247–251.
PDPPDP-2011-BoyerBE #multi #programming
Dense Dynamic Programming on Multi GPU (VB, DEB, ME), pp. 545–551.
PDPPDP-2011-EschweilerBW #behaviour #performance
Patterns of Inefficient Performance Behavior in GPU Applications (DE, DB, FW), pp. 262–266.
PDPPDP-2011-KarantasisP #abstraction #clustering #memory management #programming
Programming GPU Clusters with Shared Memory Abstraction in Software (KIK, EDP), pp. 223–230.
PDPPDP-2011-YurtesenRAW #equation #implementation #integration
SSE Vectorized and GPU Implementations of Arakawa’s Formula for Numerical Integration of Equations of Fluid Motion (EY, MR, MA, JW), pp. 341–348.
PPoPPPPoPP-2011-ZhengRQA #detection #named #source code
GRace: a low-overhead mechanism for detecting data races in GPU programs (MZ, VTR, FQ, GA), pp. 135–146.
HaskellHaskell-2010-MainlandM #haskell #named
Nikola: embedding compiled GPU functions in Haskell (GM, GM), pp. 67–78.
FSEFSE-2010-LiG #kernel #scalability #smt #verification
Scalable SMT-based verification of GPU kernel functions (GL, GG), pp. 187–196.
ASPLOSASPLOS-2010-WooL #named #programmable #using
COMPASS: a programmable data prefetcher using idle GPU shaders (DHW, HHSL), pp. 297–310.
DACDAC-2010-LuoWH #effectiveness #implementation
An effective GPU implementation of breadth-first search (LL, MDFW, WmWH), pp. 52–55.
DATEDATE-2010-RathiDGCV #distance #feature model #implementation
A GPU based implementation of Center-Surround Distribution Distance for feature extraction and matching (AR, MD, WG, RTC, NV), pp. 172–177.
HPDCHPDC-2010-GharaibehAGR
A GPU accelerated storage system (AG, SAK, SG, MR), pp. 167–178.
HPDCHPDC-2010-LinWG #migration
OpenGL application live migration with GPU acceleration in personal cloud (YL, WW, KG), pp. 280–283.
PDPPDP-2010-GaikwadT #linear #parallel
Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case (AG, IMT), pp. 607–614.
PPoPPPPoPP-2010-BaghsorkhiDPGH #adaptation #architecture #modelling #performance
An adaptive performance modeling tool for GPU architectures (SSB, MD, SJP, WDG, WmWH), pp. 105–114.
PPoPPPPoPP-2010-SandesM #comparison #named #sequence #using
CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences (EFdOS, ACMAdM), pp. 137–146.
PPoPPPPoPP-2010-ZhangCO #performance
Fast tridiagonal solvers on the GPU (YZ, JC, JDO), pp. 127–136.
DACDAC-2009-ShiCHMTHW #analysis #grid #network #performance #power management
GPU friendly fast Poisson solver for structured power grid network analysis (JS, YC, WH, LM, SXDT, PHH, XW), pp. 178–183.
ICPRICPR-2008-KauffmannP #automaton
Cellular automaton for ultra-fast watershed transform on GPU (CK, NP), pp. 1–4.
CGOCGO-2008-RyooRSBUSH #optimisation #parallel #thread
Program optimization space pruning for a multithreaded gpu (SR, CIR, SSS, SSB, SZU, JAS, WmWH), pp. 195–204.
DACDAC-2008-Garland #manycore #matrix
Sparse matrix computations on manycore GPU’s (MG), pp. 2–6.
DATEDATE-2008-CopeCL #configuration management #logic #memory management #using
Using Reconfigurable Logic to Optimise GPU Memory Accesses (BC, PYKC, WL), pp. 44–49.
PPoPPPPoPP-2008-FernandesSS #parallel
Massive parallel LDPC decoding on GPU (GFPF, LS, VMMdS), pp. 83–90.
PPoPPPPoPP-2008-RyooRBSKH #evaluation #optimisation #parallel #performance #thread #using
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA (SR, CIR, SSB, SSS, DBK, WmWH), pp. 73–82.
CGOCGO-2007-Buck #parallel #programming
GPU Computing: Programming a Massively Parallel Processor (IB), p. 17.
ISMMISMM-2007-Kirk #architecture #parallel
NVIDIA cuda software and gpu parallel computing architecture (DK), pp. 103–104.
ICPRICPR-v3-2006-MinM
Tensor Voting Accelerated by Graphics Processing Units (GPU) (CM, GGM), pp. 1103–1106.
SACSAC-2006-LejdforsO #embedded #generative #implementation
Implementing an embedded GPU language by combining translation and generation (CL, LO), pp. 1610–1614.

Bibliography of Software Language Engineering in Generated Hypertext (BibSLEIGH) is created and maintained by Dr. Vadim Zaytsev.
Hosted as a part of SLEBOK on GitHub.