72 papers:
DAC-2015-JungC #embedded #multi #named #performance #simulation- ΣVP: host-GPU multiplexing for efficient simulation of multiple embedded GPUs on virtual platforms (YJ, LPC), p. 6.
DATE-2015-RahimiGCBG #approximate #energy #memory management- Approximate associative memristive memory for energy-efficient GPUs (AR, AG, KTC, LB, RKG), pp. 1497–1502.
VLDB-2015-ZhangWYGLZ #in memory #named #throughput- Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores (KZ, KW, YY, LG, RL, XZ), pp. 1226–1237.
SAC-2015-RochaRCOMVADGF #algorithm #classification #dataset #documentation #named #performance #using- G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
SAC-2015-RodriguesJD #recommendation #using- Accelerating recommender systems using GPUs (AVR, AJ, ID), pp. 879–884.
ASPLOS-2015-AgarwalNSOK #memory management- Page Placement Strategies for GPUs within Heterogeneous Memory Systems (NA, DWN, MS, MO, SWK), pp. 607–618.
CGO-2015-FauziaPS #memory management- Characterizing and enhancing global memory data coalescing on GPUs (NF, LNP, PS), pp. 12–22.
HPCA-2015-AgarwalNOKW- Unlocking bandwidth for GPUs in CC-NUMA systems (NA, DWN, MO, SWK, TFW), pp. 354–365.
HPCA-2015-LiuLJCT #comprehension #empirical- Understanding the virtualization “Tax” of scale-out pass-through GPUs in GaaS clouds: An empirical study (ML, TL, NJ, AC, VT), pp. 259–270.
HPCA-2015-XieLWSW #coordination- Coordinated static and dynamic cache bypassing for GPUs (XX, YL, YW, GS, TW), pp. 76–88.
PPoPP-2015-SeoKK #graph #named #scalability #streaming- GStream: a graph streaming processing method for large-scale graphs on GPUs (HS, JK, MSK), pp. 253–254.
ICLP-2015-DovierFPV #execution #parallel- Parallel Execution of the ASP Computation — an Investigation on GPUs (AD, AF, EP, FV).
ASE-2014-RajanSSK #execution #using- Accelerated test execution using GPUs (AR, SS, PS, DK), pp. 97–102.
DAC-2014-NandakumarM #analysis- System-Level Floorplan-Aware Analysis of Integrated CPU-GPUs (VSN, MMS), p. 6.
DAC-2014-SamavatianAAS #architecture #performance- An Efficient STT-RAM Last Level Cache Architecture for GPUs (MHS, HA, MA, HSA), p. 6.
DATE-2014-AguileraLFMSK #algorithm #clustering #multi #process- Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking (PA, JL, AFF, KM, MJS, NSK), pp. 1–6.
VLDB-2014-WangZYMLD0 #concurrent #query- Concurrent Analytical Query Processing with GPUs (KW, KZ, YY, SM, RL, XD, XZ), pp. 1011–1022.
TACAS-2014-WijsB #manycore #named #on the fly #using- GPUexplore: Many-Core On-the-Fly State Space Exploration Using GPUs (AW, DB), pp. 233–247.
ICML-c1-2014-GiesekeHOI #nearest neighbour #query- Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs (FG, JH, CEO, CI), pp. 172–180.
PADL-2014-CampeottoPDFP #constraints #theorem proving- Exploring the Use of GPUs in Constraint Solving (FC, ADP, AD, FF, EP), pp. 152–167.
ASPLOS-2014-PichaiHB #architecture #cpu #design #memory management- Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces (BP, LH, AB), pp. 743–758.
CC-2014-AnantpurG #control flow- Taming Control Divergence in GPUs through Control Flow Linearization (JA, RG), pp. 133–153.
CC-2014-WangPFO #legacy #parallel- Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code (ZW, DCP, BF, MFPO), pp. 154–173.
CGO-2014-BarikKMLSHNA #c++ #performance- Efficient Mapping of Irregular C++ Applications to Integrated GPUs (RB, RK, DM, BTL, TS, CH, YN, ARAT), p. 33.
CGO-2014-GrosserCHSV #hybrid- Hybrid Hexagonal/Classical Tiling for GPUs (TG, AC, JH, PS, SV), p. 66.
CGO-2014-JuegaGTC #adaptation #automation #code generation #parametricity- Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs (JCJ, JIG, CT, FC), p. 251.
CGO-2014-WuDSABGY #execution #query #relational- Red Fox: An Execution Environment for Relational Query Processing on GPUs (HW, GFD, TS, MA, SB, MG, SY), p. 44.
HPCA-2014-HechtmanCHTBHRW #approach #consistency #named- QuickRelease: A throughput-oriented approach to release consistency on GPUs (BAH, SC, DRH, YT, BMB, MDH, SKR, DAW), pp. 189–200.
HPCA-2014-LakshminarayanaK #algorithm #graph- Spare register aware prefetching for graph algorithms on GPUs (NBL, HK), pp. 614–625.
HPCA-2014-PalframanKL #fault- Precision-aware soft error protection for GPUs (DJP, NSK, MHL), pp. 49–59.
HPCA-2014-XiangYZ- Warp-level divergence in GPUs: Characterization, impact, and mitigation (PX, YY, HZ), pp. 284–295.
HPDC-2014-KhorasaniVGB #graph #named- CuSha: vertex-centric graph processing on GPUs (FK, KV, RG, LNB), pp. 239–252.
ISMM-2014-EgielskiHZ #parallel- Massive atomics for massive parallelism on GPUs (IJE, JH, EZZ), pp. 93–103.
PPoPP-2014-BauerTA #named #performance- Singe: leveraging warp specialization for high performance on GPUs (MB, ST, AA), pp. 119–130.
PPoPP-2014-MaAC #algorithm #analysis #manycore #thread- Theoretical analysis of classic algorithms on highly-threaded many-core GPUs (LM, KA, RDC), pp. 391–392.
PPoPP-2014-SandesMMMA #comparison #parallel #sequence- Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs (EFdOS, GM, ACMAdM, XM, EA), pp. 383–384.
PPoPP-2014-YanLZZ #framework #named- yaSpMV: yet another SpMV framework on GPUs (SY, CL, YZ, HZ), pp. 107–118.
DATE-2013-BertaccoCBFVKP #on the- On the use of GP-GPUs for accelerating compute-intensive EDA applications (VB, DC, NB, FF, SV, AMK, HDP), pp. 1357–1366.
ASPLOS-2013-SilbersteinFKW #file system #named- GPUfs: integrating a file system with GPUs (MS, BF, IK, EW), pp. 485–498.
CGO-2013-LaiS #analysis #bound #optimisation #performance- Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs (JL, AS), p. 10.
HPDC-2013-SajjapongseWB #clustering #multi #runtime- A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs (KS, XW, MB), pp. 179–190.
PPoPP-2013-NasreBP #algorithm- Morph algorithms on GPUs (RN, MB, KP), pp. 147–156.
PPoPP-2013-YanLZ #algorithm #named #performance- StreamScan: fast scan algorithms for GPUs without global barrier synchronization (SY, GL, YZ), pp. 229–238.
PPoPP-2013-YuB #automaton #performance #regular expression- Exploring different automata representations for efficient regular expression matching on GPUs (XY, MB), pp. 287–288.
DATE-2012-BombieriFG #fault #framework #functional #named #simulation #verification- FAST-GP: An RTL functional verification framework based on fault simulation on GP-GPUs (NB, FF, VG), pp. 562–565.
DATE-2012-LiangCZRZJC #3d #implementation #locality #optimisation #performance #realtime- Real-time implementation and performance optimization of 3D sound localization on GPUs (YL, ZC, SZ, KR, YZ, DLJ, DC), pp. 832–835.
DATE-2012-WangRR #energy #runtime- Run-time power-gating in caches of GPUs for leakage energy savings (YW, SR, NR), pp. 300–303.
ESOP-2012-HabermaierK #correctness #execution #on the- On the Correctness of the SIMT Execution Model of GPUs (AH, AK), pp. 316–335.
PLDI-2012-DubachCRBF #architecture #compilation- Compiling a high-level language for GPUs: (via language support for architectures and compilers) (CD, PC, RMR, DFB, SJF), pp. 1–12.
HPDC-2012-BecchiSGPRC #clustering #memory management #multitenancy #runtime- A virtual memory based runtime to support multi-tenancy in clusters with GPUs (MB, KS, IG, AMP, VTR, STC), pp. 97–108.
HPDC-2012-ChenA #effectiveness #memory management #optimisation #pipes and filters- Optimizing MapReduce for GPUs with effective shared memory usage (LC, GA), pp. 199–210.
ISMM-2012-MaasRMAJK #garbage collection- GPUs as an opportunity for offloading garbage collection (MM, PR, JM, KA, ADJ, JK), pp. 25–36.
PPoPP-2012-LiLSGGR #generative #named #testing #verification- GKLEE: concolic verification and test generation for GPUs (GL, PL, GS, GG, IG, SPR), pp. 215–224.
PPoPP-2012-ZhongH #bibliography #graph- An overview of Medusa: simplified graph processing on GPUs (JZ, BH), pp. 283–284.
VLDB-2011-YangPS #graph #mining #multi #performance- Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining (XY, SP, PS), pp. 231–242.
GPCE-2011-NystromWD #compilation #named #runtime #scala- Firepile: run-time compilation for GPUs in scala (NN, DW, KD), pp. 107–116.
POPL-2011-PrabhuRMH #analysis #named- EigenCFA: accelerating flow analysis with GPUs (TP, SR, MM, MWH), pp. 511–522.
PPoPP-2011-GrossetZLVH #graph- Evaluating graph coloring on GPUs (AVPG, PZ, SL, SV, MWH), pp. 297–298.
PPoPP-2011-KimKLL #image #multi- Achieving a single compute device image in OpenCL for multiple GPUs (JK, HK, JHL, JL), pp. 277–288.
SOSP-2011-RossbachCSRW #abstraction #named #operating system- PTask: operating system abstractions to manage GPUs as compute devices (CJR, JC, MS, BR, EW), pp. 233–248.
DAC-2010-FengZ #analysis #grid #parallel #power management #robust- Parallel multigrid preconditioning on graphics processing units (GPUs) for robust power grid analysis (ZF, ZZ), pp. 661–666.
DAC-2010-WangZD #distributed #logic #parallel #simulation- Distributed time, conservative parallel logic simulation on GPUs (BDW, YZ, YD), pp. 761–766.
SIGMOD-2010-KimCSSNKLBD #architecture #named #performance- FAST: fast architecture sensitive tree search on modern CPUs and GPUs (CK, JC, NS, ES, ADN, TK, VWL, SAB, PD), pp. 339–350.
SIGMOD-2010-SatishKCNLKD #performance- Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort (NS, CK, JC, ADN, VWL, DK, PD), pp. 351–362.
SAC-2010-JiCW #scalability #simulation- A simulation of large-scale groundwater flow on CUDA-enabled GPUs (XJ, TC, QW), pp. 2402–2403.
PPoPP-2010-ChoiSV #modelling #multi- Model-driven autotuning of sparse matrix-vector multiply on GPUs (JC, AS, RWV), pp. 115–126.
DAC-2009-ChatterjeeDB #simulation- Event-driven gate-level simulation with GP-GPUs (DC, AD, VB), pp. 557–562.
CGO-2009-UdupaGT #execution #pipes and filters #source code- Software Pipelined Execution of Stream Programs on GPUs (AU, RG, MJT), pp. 200–209.
PPoPP-2009-MaA #compilation #data mining #mining #runtime- A compiler and runtime system for enabling data mining applications on gpus (WM, GA), pp. 287–288.
ICPR-2008-GongC #graph #learning #online #optimisation #realtime #segmentation #using- Real-time foreground segmentation on GPUs using local online learning and global graph cut optimization (MG, LC), pp. 1–4.
ASPLOS-2006-TarditiPO #named #parallel #using- Accelerator: using data parallelism to program GPUs for general-purpose uses (DT, SP, JO), pp. 325–335.
ICDAR-2005-SteinkrauSB #algorithm #machine learning #using- Using GPUs for Machine Learning Algorithms (DS, PYS, IB), pp. 1115–1119.