Tag #gpu
172 papers:
- ASPLOS-2020-HuangJ0 #learning #memory management #named
- SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping (CCH, GJ, JL0), pp. 1341–1355.
- ASPLOS-2020-PengSD0MXYQ #learning #memory management #named
- Capuchin: Tensor-based GPU Memory Management for Deep Learning (XP, XS, HD, HJ0, WM, QX, FY, XQ), pp. 891–905.
- CGO-2020-ShobakiKM #approach #combinator #optimisation #using
- Optimizing occupancy and ILP on the GPU using a combinatorial approach (GS, AK, SM), pp. 133–144.
- IFM-2019-OsamaW #named #satisfiability
- SIGmA: GPU Accelerated Simplification of SAT Formulas (MO, AW), pp. 514–522.
- KDD-2019-FuZCR #named #optimisation #performance #robust #visualisation
- AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization (CF, YZ, DC, XR), pp. 176–186.
- ASE-2019-Laguna #detection #exception #float #named
- FPChecker: Detecting Floating-Point Exceptions in GPU Applications (IL), pp. 1126–1129.
- ASPLOS-2019-PhothilimthanaE #data flow #kernel #synthesis
- Swizzle Inventor: Data Movement Synthesis for GPU Kernels (PMP, ASE, AW0, AJ, BH, HB, SJK, VG, ET, RB), pp. 65–78.
- CGO-2019-LiL0 #optimisation #runtime
- Accelerating GPU Computing at Runtime with Binary Optimization (GL, LL, XF0), pp. 276–277.
- FASE-2019-PengR #effectiveness #kernel #named
- CLTestCheck: Measuring Test Effectiveness for GPU Kernels (CP, AR), pp. 315–331.
- ICFP-2018-ElsmanHAO #functional #higher-order #in the large #programming
- Static interpretation of higher-order modules in Futhark: functional GPU programming in the large (ME, TH, DA, CEO), p. 30.
- PLDI-2018-HongSKRKPRS #analysis #kernel #optimisation #using
- GPU code optimization using abstract kernel emulation and sensitivity analysis (CH, ASR, JK, PSR, SK, LNP, FR, PS), pp. 736–751.
- SAS-2018-AlurDS #independence #source code
- Block-Size Independence for GPU Programs (RA, JD, NS), pp. 107–126.
- ASPLOS-2018-Ausavarungnirun #concurrent #memory management #multi #named
- MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency (RA, VM, JL, SG, JG, AJ, CJR, OM), pp. 503–518.
- ASPLOS-2018-YaoMLSC #named #web
- Sugar: Secure GPU Acceleration in Web Browsers (ZY, ZM, YL, AAS, AC), pp. 519–534.
- CASE-2018-InuiEMZ #geometry #process #simulation #using
- Geometric Simulation of Infeed Grinding Process of Silicon Wafer Using GPU (MI, YE, TM, LZ), pp. 1519–1524.
- ESEC-FSE-2017-SorensenED #algorithm #kernel #multi
- Cooperative kernels: GPU multitasking for blocking algorithms (TS0, HE, AFD), pp. 431–441.
- CASE-2017-InuiKU #parallel #performance #using
- Fast extraction of cutter engagement features by using the parallel processing function of a GPU (MI, MK, NU), pp. 668–673.
- CC-2017-ShirakoHS #parallel #using
- Optimized two-level parallelization for GPU accelerators using the polyhedral model (JS, AH, VS), pp. 22–33.
- CGO-2017-GongCZUK #execution #hardware #kernel #named #scheduling
- TwinKernels: an execution model to improve GPU hardware scheduling at compile time (XG, ZC, AKZ, RU, DRK), pp. 39–49.
- CGO-2017-SteuwerRD #code generation #functional #information retrieval #named
- Lift: a functional data-parallel IR for high-performance GPU code generation (MS, TR, CD), pp. 74–85.
- CAV-2017-AlurDLS #detection #named #source code
- GPUDrano: Detecting Uncoalesced Accesses in GPU Programs (RA, JD, OSNL, NS), pp. 507–525.
- FM-2016-WijsNB #model checking
- GPUexplore 2.0: Unleashing GPU Explicit-State Model Checking (AW, TN, DB), pp. 694–701.
- PADL-2016-DovierFPV #implementation
- A GPU Implementation of the ASP Computation (AD, AF, EP, FV), pp. 30–47.
- PLDI-2016-SorensenD #fault #memory management
- Exposing errors related to weak memory in GPU applications (TS0, AFD), pp. 100–113.
- CC-2016-MajetiMBS #architecture #automation #cpu #generative #kernel #layout
- Automatic data layout generation and kernel mapping for CPU+GPU architectures (DM, KSM, RB, VS), pp. 240–250.
- CGO-2016-BarikFLHS #approach #black box #cpu #energy #scheduling
- A black-box approach to energy-aware scheduling on integrated CPU-GPU systems (RB, NF, BTL, CH, TS), pp. 70–81.
- VLDB-2015-BoghCA #parallel
- Work-Efficient Parallel Skyline Computation for the GPU (KSB, SC, IA), pp. 962–973.
- FDG-2015-MarkBMT #3d #game studies #generative
- Procedural Generation of 3D Caves for Games on the GPU (BM, TB, TM, JT).
- CIKM-2015-JoKB #analysis #matrix #multi #network #performance #scalability #social
- Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis (YYJ, SWK, DHB), pp. 1261–1270.
- ICML-2015-TristanTS #estimation #performance
- Efficient Training of LDA on a GPU by Mean-for-Mode Estimation (JBT, JT, GLSJ), pp. 59–68.
- PLDI-2015-SharmaBA #source code #verification
- Verification of producer-consumer synchronization in GPU programs (RS, MB, AA), pp. 88–98.
- ICSE-v2-2015-Salgado #behaviour #cpu #interactive #kernel #profiling
- Profiling Kernels Behavior to Improve CPU / GPU Interactions (RS), pp. 754–756.
- SAC-2015-JoselliJC #animation #data type #named #proximity
- NGrid: a proximity data structure for fluids animation with GPU computing (MJ, JRdSJ, EC), pp. 1303–1308.
- SAC-2015-MartinCBGP #algorithm
- OpenACC-based GPU acceleration of an optical flow algorithm (NM, JC, GB, CG, MP), pp. 96–98.
- GPCE-2015-KolesnichenkoPN #contract #programming
- Contract-based general-purpose GPU programming (AK, CMP, SN, BM), pp. 75–84.
- ASPLOS-2015-AlglaveBDGKPSW #behaviour #concurrent #programming
- GPU Concurrency: Weak Behaviours and Programming Assumptions (JA, MB, AFD, GG, JK, DP, TS, JW), pp. 577–591.
- ASPLOS-2015-ParkPM #collaboration #multi #named
- Chimera: Collaborative Preemption for Multitasking on a Shared GPU (JJKP, YP, SAM), pp. 593–606.
- CGO-2015-LiYLZ #automation #memory management
- Automatic data placement into GPU on-chip memory resources (CL, YY, ZL, HZ), pp. 23–33.
- DAC-2015-HanF #analysis #approach #cpu #graph #scalability
- Transient-simulation guided graph sparsification approach to scalable harmonic balance (HB) analysis of post-layout RF circuits leveraging heterogeneous CPU-GPU computing systems (LH, ZF), p. 6.
- DAC-2015-KadjoAKG #approach #cpu #energy #mobile #performance #platform
- A control-theoretic approach for energy efficient CPU-GPU subsystem in mobile platforms (DK, RA, MK, PVG), p. 6.
- DATE-2015-GerumBR #performance #simulation
- Source level performance simulation of GPU cores (CG, OB, WR), pp. 217–222.
- DATE-2015-NguyenASS #platform #simulation
- Accelerating complex brain-model simulations on GPU platforms (HADN, ZAA, GS, CS), pp. 974–979.
- DATE-2015-ParkAHYL #big data #energy #low cost #memory management #performance
- Memory fast-forward: a low cost special function unit to enhance energy efficiency in GPU for big data processing (EP, JA, SH, SY, SL), pp. 1341–1346.
- DATE-2015-WangLWY
- Eliminating intra-warp conflict misses in GPU (BW, ZL, XW, WY), pp. 689–694.
- HPCA-2015-AroraMPJT #behaviour #benchmark #comprehension #cpu #metric #power management
- Understanding idle behavior and power gating mechanisms in the context of modern benchmarks on CPU-GPU Integrated systems (MA, SM, IP, NJ, DMT), pp. 366–377.
- HPCA-2015-LengZR #architecture
- GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures (JL, YZ, VJR), pp. 161–173.
- HPCA-2015-SethiaJM #memory management #named
- Mascar: Speeding up GPU warps by reducing memory pitstops (AS, DAJ, SAM), pp. 174–185.
- HPCA-2015-TiwariGRMRVOLDN #comprehension #design #fault #scalability
- Understanding GPU errors on large-scale HPC systems and the implications for system design and operation (DT, SG, JHR, DM, PR, SSV, DAGdO, DL, ND, POAN, LC, ASB), pp. 331–342.
- HPDC-2015-WahibM #automation #kernel #scalability
- Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications (MW, NM), pp. 259–270.
- HPDC-2015-XiaoCHZ #cpu #monte carlo
- Monte Carlo Based Ray Tracing in CPU-GPU Heterogeneous Systems and Applications in Radiation Therapy (KX, DZC, XSH, BZ), pp. 247–258.
- PDP-2015-BlaskiewiczZBD #network #parallel
- An Application of GPU Parallel Computing to Power Flow Calculation in HVDC Networks (PB, MZ, PB, PD), pp. 635–641.
- PDP-2015-KonstantinidisC #bound #kernel #memory management #performance
- A Practical Performance Model for Compute and Memory Bound GPU Kernels (EK, YC), pp. 651–658.
- PDP-2015-NakanoI #algorithm #implementation #memory management #parallel
- Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation (KN, YI), pp. 626–634.
- PDP-2015-SoroushniaDPP #algorithm #implementation #parallel #pattern matching
- Parallel Implementation of Fuzzified Pattern Matching Algorithm on GPU (SS, MD, TP, JP), pp. 341–344.
- PDP-2015-YounessIMO #implementation #optimisation #performance #problem #satisfiability
- An Efficient Implementation of Ant Colony Optimization on GPU for the Satisfiability Problem (HAY, AI, MM, MO), pp. 230–235.
- PPoPP-2015-PiaoKOLKKL #adaptation #cpu #framework #javascript #named
- JAWS: a JavaScript framework for adaptive CPU-GPU work sharing (XP, CK, YO, HL, JK, HK, JWL), pp. 251–252.
- PPoPP-2015-ShiLDHJLWLZ #graph #hybrid #optimisation
- Optimization of asynchronous graph processing on GPU with hybrid coloring model (XS, JL, SD, BH, HJ, LL, ZW, XL, JZ), pp. 271–272.
- PPoPP-2015-WangDPWRO #graph #library #named
- Gunrock: a high-performance graph processing library on the GPU (YW, AAD, YP, YW, AR, JDO), pp. 265–266.
- TACAS-2015-Wijs #branch #similarity
- GPU Accelerated Strong and Branching Bisimilarity Checking (AW), pp. 368–383.
- VLDB-2015-HeZH14 #architecture #cpu #query
- In-Cache Query Co-Processing on Coupled CPU-GPU Architectures (JH, SZ, BH), pp. 329–340.
- ICEIS-v1-2014-PenaAMFF #algorithm #parallel #using
- An Improved Parallel Algorithm Using GPU for Siting Observers on Terrain (GCP, MVAA, SVGM, WRF, CRF), pp. 367–375.
- SEKE-2014-JuniorCMS #data analysis #repository
- Exploratory Data Analysis of Software Repositories via GPU Processing (JRDSJ, EC, LM, AS), pp. 495–500.
- OOPSLA-2014-HolkNSL #data type #memory management #programming language
- Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host (EH, RN, JGS, AL), pp. 141–155.
- CGO-2014-XuWGLGQ #architecture #memory management #transaction
- Software Transactional Memory for GPU Architectures (YX, RW, NG, TL, LG, DQ), p. 1.
- DAC-2014-KoKYKH #cpu #platform #simulation
- Hardware-in-the-loop Simulation for CPU/GPU Heterogeneous Platforms (YK, TK, YY, MK, SH), p. 6.
- DAC-2014-PathaniaJPM #3d #cpu #game studies #mobile #power management
- Integrated CPU-GPU Power Management for 3D Mobile Games (AP, QJ, AP, TM), p. 6.
- DATE-2014-LeeL #3d #on the #reduction
- On GPU bus power reduction with 3D IC technologies (YJL, SKL), pp. 1–6.
- HPCA-2014-ElTantawyMOA #architecture #control flow #multi #performance #scalability
- A scalable multi-path microarchitecture for efficient GPU control flow (AE, JWM, MO, TMA), pp. 248–259.
- HPCA-2014-KimLJK #architecture #memory management #named #using
- GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management (YK, JL, JEJ, JK), pp. 546–557.
- HPCA-2014-NugterenBCB #distance #modelling #reuse
- A detailed GPU cache model based on reuse distance theory (CN, GJvdB, HC, HEB), pp. 37–48.
- HPCA-2014-PowerHW
- Supporting x86-64 address translation for 100s of GPU lanes (JP, MDH, DAW), pp. 568–578.
- OSDI-2014-KimHZHWWS #abstraction #named #network #source code
- GPUnet: Networking Abstractions for GPU Programs (SK, SH, XZ, YH, AW, EW, MS), pp. 201–216.
- PDP-2014-ArbelaezC #constraints #implementation #parallel
- A GPU Implementation of Parallel Constraint-Based Local Search (AA, PC), pp. 648–655.
- PDP-2014-BelliniBCMN #analysis #simulation
- Simulation and Analysis of the Blood Coagulation Cascade Accelerated on GPU (MB, DB, PC, GM, MSN), pp. 590–593.
- PDP-2014-BoobGP #automation #cpu #parallel #performance
- Automated Instantiation of Heterogeneous Fast Flow CPU/GPU Parallel Pattern Applications in Clouds (SB, HGV, AMP), pp. 162–169.
- PDP-2014-CamargosSGM #linear
- Iterative Solution on GPU of Linear Systems Arising from the A-V Edge-FEA of Time-Harmonic Electromagnetic Phenomena (AFPC, VCS, JMG, GM), pp. 365–371.
- PDP-2014-DefourM #fuzzy #library #named
- FuzzyGPU: A Fuzzy Arithmetic Library for GPU (DD, MM), pp. 624–631.
- PDP-2014-GargH #cpu #library #multi
- A Portable and High-Performance General Matrix-Multiply (GEMM) Library for GPUs and Single-Chip CPU/GPU Systems (RG, LJH), pp. 672–680.
- PDP-2014-IshigamiKN #algorithm #implementation
- GPU Implementation of Inverse Iteration Algorithm for Computing Eigenvectors (HI, KK, YN), pp. 664–671.
- PDP-2014-SanchezAGCMC #approach #named
- FRODRUG: A Virtual Screening GPU Accelerated Approach for Drug Discovery (SGS, ERA, JIG, PC, ASM, RC), pp. 594–600.
- PDP-2014-Topa #automaton #memory management #performance
- Cellular Automata Model Tuned for Efficient Computation on GPU with Global Memory Cache (PT), pp. 380–383.
- CAV-2014-BardsleyBCCDDKLQ #kernel #verification
- Engineering a Static Verification Tool for GPU Kernels (EB, AB, NC, PC, PD, AFD, JK, DL, SQ), pp. 226–242.
- VLDB-2013-Bress #hybrid #performance #query #why
- Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS (SB), pp. 1398–1403.
- VLDB-2013-HeLH #architecture #cpu
- Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture (JH, ML, BH), pp. 889–900.
- VLDB-2013-YuanL0 #query
- The Yin and Yang of Processing Data Warehousing Queries on GPU Devices (YY, RL, XZ), pp. 817–828.
- VLDB-2013-ZhangHHL #architecture #cpu #named #parallel #performance #query #towards
- OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures (SZ, JH, BH, ML), pp. 1374–1377.
- CSMR-2013-ScannielloECG #using
- Using the GPU to Green an Intensive and Massive Computation System (GS, UE, GC, CG), pp. 384–387.
- ICFP-2013-McDonellCKL #functional #optimisation #source code
- Optimising purely functional GPU programs (TLM, MMTC, GK, BL), pp. 49–60.
- OOPSLA-2013-ChongDKKQ #abstraction #analysis #invariant #kernel
- Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels (NC, AFD, PHJK, JK, SQ), pp. 605–622.
- ASPLOS-2013-JooybarFODA #architecture #named
- GPUDet: a deterministic GPU architecture (HJ, WWLF, MO, JD, TMA), pp. 1–12.
- DAC-2013-HanZF #named #parallel #simulation
- TinySPICE: a parallel SPICE simulator on GPU for massively repeated small circuit simulations (LH, XZ, ZF), p. 8.
- DATE-2013-ZakharenkoAM #cpu #performance #using
- Characterizing the performance benefits of fused CPU/GPU systems using FusionSim (VZ, TMA, AM), pp. 685–688.
- HPCA-2013-LustigM #cpu #fine-grained #latency
- Reducing GPU offload latency via fine-grained CPU-GPU synchronization (DL, MM), pp. 354–365.
- HPCA-2013-RhuE #control flow #execution #performance
- The dual-path execution model for efficient GPU control flow (MR, ME), pp. 591–602.
- HPCA-2013-SinghSFOA #architecture
- Cache coherence for GPU architectures (IS, AS, WWLF, MO, TMA), pp. 578–590.
- HPDC-2013-YuZQYWG #game studies #named #scheduling
- VGRIS: virtualized GPU resource isolation and scheduling in cloud gaming (MY, CZ, ZQ, JY, YW, HG), pp. 203–214.
- PDP-2013-BukataS #algorithm #design #problem #scheduling
- A GPU Algorithm Design for Resource Constrained Project Scheduling Problem (LB, PS), pp. 367–374.
- PDP-2013-GuptaGV #3d #linear #simulation #using
- 3D Bubbly Flow Simulation on the GPU — Iterative Solution of a Linear System Using Sub-domain and Level-Set Deflation (RG, MBvG, CV), pp. 359–366.
- PDP-2013-LavilleMLPM #multi #simulation #using
- Using GPU for Multi-Agent Soil Simulation (GL, KM, CL, LP, NM), pp. 392–399.
- PPoPP-2013-DeoK #array #parallel
- Parallel suffix array and least common prefix for the GPU (MD, SK), pp. 197–206.
- PPoPP-2013-WuZZJS #algorithm #analysis #complexity #design #memory management
- Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU (BW, ZZ, EZZ, YJ, XS), pp. 57–68.
- PPoPP-2013-YangXFGLXLSYZ #algorithm #cpu #simulation
- A peta-scalable CPU-GPU algorithm for global atmospheric simulations (CY, WX, HF, LG, LL, YX, YL, JS, GY, WZ), pp. 1–12.
- ESOP-2013-CollingbourneDKQ #analysis #kernel #semantics #verification
- Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels (PC, AFD, JK, SQ), pp. 270–289.
- VLDB-2012-WangHLWZS #cpu #hybrid #image
- Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems (KW, YH, RL, FW, XZ, JHS), pp. 1543–1554.
- ICFP-2012-BergstromR
- Nested data-parallelism on the gpu (LB, JHR), pp. 247–258.
- GRAPHITE-2012-Cormie-Bowins #comparison #implementation #reachability
- A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities (ECB), pp. 20–34.
- CIKM-2012-KozawaAK #database #mining #nondeterminism #probability
- GPU acceleration of probabilistic frequent itemset mining from uncertain databases (YK, TA, HK), pp. 892–901.
- CIKM-2012-MasadaT #topic
- Extraction of topic evolutions from references in scientific articles and its GPU acceleration (TM, AT), pp. 1522–1526.
- OOPSLA-2012-BettsCDQT #kernel #named #verification
- GPUVerify: a verifier for GPU kernels (AB, NC, AFD, SQ, PT), pp. 113–132.
- PLDI-2012-LeungGAGJL #kernel #verification
- Verifying GPU kernels by test amplification (AL, MG, YA, RG, RJ, SL), pp. 383–394.
- SAC-2012-FazackerleyML #database
- GPU accelerated AES-CBC for database applications (SF, SMM, RL), pp. 873–878.
- SAC-2012-JiXWLTY #sequence
- High-throughput antibody sequence alignment based on GPU computing (GJ, ZX, XW, SL, MT, JY), pp. 1417–1418.
- CC-2012-UnkuleSQ #automation #kernel #locality #thread
- Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality (SU, CS, AQ), pp. 21–40.
- CGO-2012-JablinJPLA #architecture #cpu
- Dynamically managed data for CPU-GPU architectures (TBJ, JAJ, PP, FL, DIA), pp. 165–174.
- CGO-2012-ZhangM #3d #clustering
- Auto-generation and auto-tuning of 3D stencil codes on GPU clusters (YZ, FM), pp. 155–164.
- DAC-2012-JeongESP #cpu #memory management
- A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC (MKJ, ME, CS, NCP), pp. 850–855.
- DAC-2012-KimLCKWYL #cpu #hybrid #in memory #memory management
- Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU (DK, SL, JC, DK, DHW, SY, SL), pp. 888–896.
- DAC-2012-RenCWZY #parallel #simulation
- Sparse LU factorization for parallel circuit simulation on GPU (LR, XC, YW, CZ, HY), pp. 1125–1130.
- DAC-2012-VincoCBF #architecture #named
- SAGA: SystemC acceleration on GPU architectures (SV, DC, VB, FF), pp. 115–120.
- HPCA-2012-LeeK #architecture #cpu #named #policy
- TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture (JL, HK), pp. 91–102.
- HPCA-2012-YangXMZ #architecture #cpu
- CPU-assisted GPGPU on fused CPU-GPU architectures (YY, PX, MM, HZ), pp. 103–114.
- PDP-2012-BoukedjarLB #bound #branch #cpu #parallel
- Parallel Branch and Bound on a CPU-GPU System (AB, MEL, DEB), pp. 392–398.
- PDP-2012-DilchM #algorithm #analysis #architecture #novel #optimisation #performance
- Optimization Techniques and Performance Analyses of Two Life Science Algorithms for Novel GPU Architectures (DD, EM), pp. 376–383.
- PDP-2012-FortinGG #towards
- Towards Solving the Table Maker’s Dilemma on GPU (PF, MG, SG), pp. 407–415.
- PDP-2012-RungsawangM #clustering #performance #rank
- Fast PageRank Computation on a GPU Cluster (AR, BM), pp. 450–456.
- PDP-2012-SpigaG #cpu #hybrid #library #migration #named #quantum
- phiGEMM: A CPU-GPU Library for Porting Quantum ESPRESSO on Hybrid Systems (FS, IG), pp. 368–375.
- PPoPP-2012-KimSLNJL #clustering #cpu #programming
- OpenCL as a unified programming model for heterogeneous CPU/GPU clusters (JK, SS, JL, JN, GJ, JL), pp. 299–300.
- PPoPP-2012-LiuAHLSZWT #implementation #named
- FlexBFS: a parallelism-aware implementation of breadth-first search on GPU (GL, HA, WH, XL, TS, WZ, XW, XT), pp. 279–280.
- PPoPP-2012-Mendez-LojoBP #analysis #implementation #points-to
- A GPU implementation of inclusion-based points-to analysis (MML, MB, KP), pp. 107–116.
- PPoPP-2012-MerrillGG #graph #scalability #traversal
- Scalable GPU graph traversal (DM, MG, ASG), pp. 117–128.
- PPoPP-2012-TaoBB #development #kernel #scalability #using
- Using GPU’s to accelerate stencil-based computation kernels for the development of large scale scientific applications on heterogeneous systems (JT, MB, SRB), pp. 287–288.
- JCDL-2011-LiZYWWW #recommendation #social #using
- A social network-aware top-N recommender system using GPU (RL, YZ, HY, XW, JW, BW), pp. 287–296.
- CSCW-2011-AspinR #3d #approach #multi
- A GPU based, projective multi-texturing approach to reconstructing the 3D human form for application in tele-presence (RAA, DJR), pp. 105–112.
- CIKM-2011-KrulisLBSS #architecture #distance #manycore #polynomial
- Processing the signature quadratic form distance on many-core GPU architectures (MK, JL, CB, TS, TS), pp. 2373–2376.
- PLDI-2011-JablinPJJBA #automation #communication #cpu #optimisation
- Automatic CPU-GPU communication management and optimization (TBJ, PP, JAJ, NPJ, SRB, DIA), pp. 142–151.
- ASPLOS-2011-ZhangJGTS #on the fly
- On-the-fly elimination of dynamic irregularities for GPU computing (EZZ, YJ, ZG, KT, XS), pp. 369–380.
- DAC-2011-ZhaoF #3d #parallel #performance #platform
- Fast multipole method on GPU: tackling 3-D capacitance extraction on massively parallel SIMD platforms (XZ, ZF), pp. 558–563.
- DAC-2011-ZhuDC #architecture #cpu #named
- Hermes: an integrated CPU/GPU microarchitecture for IP routing (YZ, YD, YC), pp. 1044–1049.
- DATE-2011-KangD #classification #metaprogramming #scalability
- Scalable packet classification via GPU metaprogramming (KK, YSD), pp. 871–874.
- DATE-2011-Wang #coordination #kernel #power management
- Coordinate strip-mining and kernel fusion to lower power consumption on GPU (GW), pp. 1218–1219.
- HPCA-2011-ZhangO #analysis #architecture #performance
- A quantitative performance analysis model for GPU architectures (YZ, JDO), pp. 382–393.
- HPDC-2011-LiLTCZ #3d #cpu #experience #re-engineering
- Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system (LL, XL, GT, MC, PZ), pp. 195–204.
- HPDC-2011-RaviBAC #framework #runtime
- Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework (VTR, MB, GA, STC), pp. 217–228.
- ISMM-2011-VeldemaP
- Iterative data-parallel mark&sweep on a GPU (RV, MP), pp. 1–10.
- PDP-2011-BarlasHJ #approach #case study #cpu #design #encryption #parallel
- An Analytical Approach to the Design of Parallel Block Cipher Encryption/Decryption: A CPU/GPU Case Study (GDB, AH, YAJ), pp. 247–251.
- PDP-2011-BoyerBE #multi #programming
- Dense Dynamic Programming on Multi GPU (VB, DEB, ME), pp. 545–551.
- PDP-2011-EschweilerBW #behaviour #performance
- Patterns of Inefficient Performance Behavior in GPU Applications (DE, DB, FW), pp. 262–266.
- PDP-2011-KarantasisP #abstraction #clustering #memory management #programming
- Programming GPU Clusters with Shared Memory Abstraction in Software (KIK, EDP), pp. 223–230.
- PDP-2011-YurtesenRAW #equation #implementation #integration
- SSE Vectorized and GPU Implementations of Arakawa’s Formula for Numerical Integration of Equations of Fluid Motion (EY, MR, MA, JW), pp. 341–348.
- PPoPP-2011-ZhengRQA #detection #named #source code
- GRace: a low-overhead mechanism for detecting data races in GPU programs (MZ, VTR, FQ, GA), pp. 135–146.
- Haskell-2010-MainlandM #haskell #named
- Nikola: embedding compiled GPU functions in Haskell (GM, GM), pp. 67–78.
- FSE-2010-LiG #kernel #scalability #smt #verification
- Scalable SMT-based verification of GPU kernel functions (GL, GG), pp. 187–196.
- ASPLOS-2010-WooL #named #programmable #using
- COMPASS: a programmable data prefetcher using idle GPU shaders (DHW, HHSL), pp. 297–310.
- DAC-2010-LuoWH #effectiveness #implementation
- An effective GPU implementation of breadth-first search (LL, MDFW, WmWH), pp. 52–55.
- DATE-2010-RathiDGCV #distance #feature model #implementation
- A GPU based implementation of Center-Surround Distribution Distance for feature extraction and matching (AR, MD, WG, RTC, NV), pp. 172–177.
- HPDC-2010-GharaibehAGR
- A GPU accelerated storage system (AG, SAK, SG, MR), pp. 167–178.
- HPDC-2010-LinWG #migration
- OpenGL application live migration with GPU acceleration in personal cloud (YL, WW, KG), pp. 280–283.
- PDP-2010-GaikwadT #linear #parallel
- Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case (AG, IMT), pp. 607–614.
- PPoPP-2010-BaghsorkhiDPGH #adaptation #architecture #modelling #performance
- An adaptive performance modeling tool for GPU architectures (SSB, MD, SJP, WDG, WmWH), pp. 105–114.
- PPoPP-2010-SandesM #comparison #named #sequence #using
- CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences (EFdOS, ACMAdM), pp. 137–146.
- PPoPP-2010-ZhangCO #performance
- Fast tridiagonal solvers on the GPU (YZ, JC, JDO), pp. 127–136.
- DAC-2009-ShiCHMTHW #analysis #grid #network #performance #power management
- GPU friendly fast Poisson solver for structured power grid network analysis (JS, YC, WH, LM, SXDT, PHH, XW), pp. 178–183.
- ICPR-2008-KauffmannP #automaton
- Cellular automaton for ultra-fast watershed transform on GPU (CK, NP), pp. 1–4.
- CGO-2008-RyooRSBUSH #optimisation #parallel #thread
- Program optimization space pruning for a multithreaded gpu (SR, CIR, SSS, SSB, SZU, JAS, WmWH), pp. 195–204.
- DAC-2008-Garland #manycore #matrix
- Sparse matrix computations on manycore GPU’s (MG), pp. 2–6.
- DATE-2008-CopeCL #configuration management #logic #memory management #using
- Using Reconfigurable Logic to Optimise GPU Memory Accesses (BC, PYKC, WL), pp. 44–49.
- PPoPP-2008-FernandesSS #parallel
- Massive parallel LDPC decoding on GPU (GFPF, LS, VMMdS), pp. 83–90.
- PPoPP-2008-RyooRBSKH #evaluation #optimisation #parallel #performance #thread #using
- Optimization principles and application performance evaluation of a multithreaded GPU using CUDA (SR, CIR, SSB, SSS, DBK, WmWH), pp. 73–82.
- CGO-2007-Buck #parallel #programming
- GPU Computing: Programming a Massively Parallel Processor (IB), p. 17.
- ISMM-2007-Kirk #architecture #parallel
- NVIDIA cuda software and gpu parallel computing architecture (DK), pp. 103–104.
- ICPR-v3-2006-MinM
- Tensor Voting Accelerated by Graphics Processing Units (GPU) (CM, GGM), pp. 1103–1106.
- SAC-2006-LejdforsO #embedded #generative #implementation
- Implementing an embedded GPU language by combining translation and generation (CL, LO), pp. 1610–1614.