72 papers:
- DAC-2015-JungC #embedded #multi #named #performance #simulation
- ΣVP: host-GPU multiplexing for efficient simulation of multiple embedded GPUs on virtual platforms (YJ, LPC), p. 6.
- DATE-2015-RahimiGCBG #approximate #energy #memory management
- Approximate associative memristive memory for energy-efficient GPUs (AR, AG, KTC, LB, RKG), pp. 1497–1502.
- VLDB-2015-ZhangWYGLZ #in memory #named #throughput
- Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores (KZ, KW, YY, LG, RL, XZ), pp. 1226–1237.
- SAC-2015-RochaRCOMVADGF #algorithm #classification #dataset #documentation #named #performance #using
- G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
- SAC-2015-RodriguesJD #recommendation #using
- Accelerating recommender systems using GPUs (AVR, AJ, ID), pp. 879–884.
- ASPLOS-2015-AgarwalNSOK #memory management
- Page Placement Strategies for GPUs within Heterogeneous Memory Systems (NA, DWN, MS, MO, SWK), pp. 607–618.
- CGO-2015-FauziaPS #memory management
- Characterizing and enhancing global memory data coalescing on GPUs (NF, LNP, PS), pp. 12–22.
- HPCA-2015-AgarwalNOKW
- Unlocking bandwidth for GPUs in CC-NUMA systems (NA, DWN, MO, SWK, TFW), pp. 354–365.
- HPCA-2015-LiuLJCT #comprehension #empirical
- Understanding the virtualization “Tax” of scale-out pass-through GPUs in GaaS clouds: An empirical study (ML, TL, NJ, AC, VT), pp. 259–270.
- HPCA-2015-XieLWSW #coordination
- Coordinated static and dynamic cache bypassing for GPUs (XX, YL, YW, GS, TW), pp. 76–88.
- PPoPP-2015-SeoKK #graph #named #scalability #streaming
- GStream: a graph streaming processing method for large-scale graphs on GPUs (HS, JK, MSK), pp. 253–254.
- ICLP-2015-DovierFPV #execution #parallel
- Parallel Execution of the ASP Computation — an Investigation on GPUs (AD, AF, EP, FV).
- ASE-2014-RajanSSK #execution #using
- Accelerated test execution using GPUs (AR, SS, PS, DK), pp. 97–102.
- DAC-2014-NandakumarM #analysis
- System-Level Floorplan-Aware Analysis of Integrated CPU-GPUs (VSN, MMS), p. 6.
- DAC-2014-SamavatianAAS #architecture #performance
- An Efficient STT-RAM Last Level Cache Architecture for GPUs (MHS, HA, MA, HSA), p. 6.
- DATE-2014-AguileraLFMSK #algorithm #clustering #multi #process
- Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking (PA, JL, AFF, KM, MJS, NSK), pp. 1–6.
- VLDB-2014-WangZYMLD0 #concurrent #query
- Concurrent Analytical Query Processing with GPUs (KW, KZ, YY, SM, RL, XD, XZ), pp. 1011–1022.
- TACAS-2014-WijsB #manycore #named #on the fly #using
- GPUexplore: Many-Core On-the-Fly State Space Exploration Using GPUs (AW, DB), pp. 233–247.
- ICML-c1-2014-GiesekeHOI #nearest neighbour #query
- Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs (FG, JH, CEO, CI), pp. 172–180.
- PADL-2014-CampeottoPDFP #constraints #theorem proving
- Exploring the Use of GPUs in Constraint Solving (FC, ADP, AD, FF, EP), pp. 152–167.
- ASPLOS-2014-PichaiHB #architecture #cpu #design #memory management
- Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces (BP, LH, AB), pp. 743–758.
- CC-2014-AnantpurG #control flow
- Taming Control Divergence in GPUs through Control Flow Linearization (JA, RG), pp. 133–153.
- CC-2014-WangPFO #legacy #parallel
- Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code (ZW, DCP, BF, MFPO), pp. 154–173.
- CGO-2014-BarikKMLSHNA #c++ #performance
- Efficient Mapping of Irregular C++ Applications to Integrated GPUs (RB, RK, DM, BTL, TS, CH, YN, ARAT), p. 33.
- CGO-2014-GrosserCHSV #hybrid
- Hybrid Hexagonal/Classical Tiling for GPUs (TG, AC, JH, PS, SV), p. 66.
- CGO-2014-JuegaGTC #adaptation #automation #code generation #parametricity
- Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs (JCJ, JIG, CT, FC), p. 251.
- CGO-2014-WuDSABGY #execution #query #relational
- Red Fox: An Execution Environment for Relational Query Processing on GPUs (HW, GFD, TS, MA, SB, MG, SY), p. 44.
- HPCA-2014-HechtmanCHTBHRW #approach #consistency #named
- QuickRelease: A throughput-oriented approach to release consistency on GPUs (BAH, SC, DRH, YT, BMB, MDH, SKR, DAW), pp. 189–200.
- HPCA-2014-LakshminarayanaK #algorithm #graph
- Spare register aware prefetching for graph algorithms on GPUs (NBL, HK), pp. 614–625.
- HPCA-2014-PalframanKL #fault
- Precision-aware soft error protection for GPUs (DJP, NSK, MHL), pp. 49–59.
- HPCA-2014-XiangYZ
- Warp-level divergence in GPUs: Characterization, impact, and mitigation (PX, YY, HZ), pp. 284–295.
- HPDC-2014-KhorasaniVGB #graph #named
- CuSha: vertex-centric graph processing on GPUs (FK, KV, RG, LNB), pp. 239–252.
- ISMM-2014-EgielskiHZ #parallel
- Massive atomics for massive parallelism on GPUs (IJE, JH, EZZ), pp. 93–103.
- PPoPP-2014-BauerTA #named #performance
- Singe: leveraging warp specialization for high performance on GPUs (MB, ST, AA), pp. 119–130.
- PPoPP-2014-MaAC #algorithm #analysis #manycore #thread
- Theoretical analysis of classic algorithms on highly-threaded many-core GPUs (LM, KA, RDC), pp. 391–392.
- PPoPP-2014-SandesMMMA #comparison #parallel #sequence
- Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs (EFdOS, GM, ACMAdM, XM, EA), pp. 383–384.
- PPoPP-2014-YanLZZ #framework #named
- yaSpMV: yet another SpMV framework on GPUs (SY, CL, YZ, HZ), pp. 107–118.
- DATE-2013-BertaccoCBFVKP #on the
- On the use of GP-GPUs for accelerating compute-intensive EDA applications (VB, DC, NB, FF, SV, AMK, HDP), pp. 1357–1366.
- ASPLOS-2013-SilbersteinFKW #file system #named
- GPUfs: integrating a file system with GPUs (MS, BF, IK, EW), pp. 485–498.
- CGO-2013-LaiS #analysis #bound #optimisation #performance
- Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs (JL, AS), p. 10.
- HPDC-2013-SajjapongseWB #clustering #multi #runtime
- A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs (KS, XW, MB), pp. 179–190.
- PPoPP-2013-NasreBP #algorithm
- Morph algorithms on GPUs (RN, MB, KP), pp. 147–156.
- PPoPP-2013-YanLZ #algorithm #named #performance
- StreamScan: fast scan algorithms for GPUs without global barrier synchronization (SY, GL, YZ), pp. 229–238.
- PPoPP-2013-YuB #automaton #performance #regular expression
- Exploring different automata representations for efficient regular expression matching on GPUs (XY, MB), pp. 287–288.
- DATE-2012-BombieriFG #fault #framework #functional #named #simulation #verification
- FAST-GP: An RTL functional verification framework based on fault simulation on GP-GPUs (NB, FF, VG), pp. 562–565.
- DATE-2012-LiangCZRZJC #3d #implementation #locality #optimisation #performance #realtime
- Real-time implementation and performance optimization of 3D sound localization on GPUs (YL, ZC, SZ, KR, YZ, DLJ, DC), pp. 832–835.
- DATE-2012-WangRR #energy #runtime
- Run-time power-gating in caches of GPUs for leakage energy savings (YW, SR, NR), pp. 300–303.
- ESOP-2012-HabermaierK #correctness #execution #on the
- On the Correctness of the SIMT Execution Model of GPUs (AH, AK), pp. 316–335.
- PLDI-2012-DubachCRBF #architecture #compilation
- Compiling a high-level language for GPUs: (via language support for architectures and compilers) (CD, PC, RMR, DFB, SJF), pp. 1–12.
- HPDC-2012-BecchiSGPRC #clustering #memory management #multitenancy #runtime
- A virtual memory based runtime to support multi-tenancy in clusters with GPUs (MB, KS, IG, AMP, VTR, STC), pp. 97–108.
- HPDC-2012-ChenA #effectiveness #memory management #optimisation #pipes and filters
- Optimizing MapReduce for GPUs with effective shared memory usage (LC, GA), pp. 199–210.
- ISMM-2012-MaasRMAJK #garbage collection
- GPUs as an opportunity for offloading garbage collection (MM, PR, JM, KA, ADJ, JK), pp. 25–36.
- PPoPP-2012-LiLSGGR #generative #named #testing #verification
- GKLEE: concolic verification and test generation for GPUs (GL, PL, GS, GG, IG, SPR), pp. 215–224.
- PPoPP-2012-ZhongH #bibliography #graph
- An overview of Medusa: simplified graph processing on GPUs (JZ, BH), pp. 283–284.
- VLDB-2011-YangPS #graph #mining #multi #performance
- Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining (XY, SP, PS), pp. 231–242.
- GPCE-2011-NystromWD #compilation #named #runtime #scala
- Firepile: run-time compilation for GPUs in scala (NN, DW, KD), pp. 107–116.
- POPL-2011-PrabhuRMH #analysis #named
- EigenCFA: accelerating flow analysis with GPUs (TP, SR, MM, MWH), pp. 511–522.
- PPoPP-2011-GrossetZLVH #graph
- Evaluating graph coloring on GPUs (AVPG, PZ, SL, SV, MWH), pp. 297–298.
- PPoPP-2011-KimKLL #image #multi
- Achieving a single compute device image in OpenCL for multiple GPUs (JK, HK, JHL, JL), pp. 277–288.
- SOSP-2011-RossbachCSRW #abstraction #named #operating system
- PTask: operating system abstractions to manage GPUs as compute devices (CJR, JC, MS, BR, EW), pp. 233–248.
- DAC-2010-FengZ #analysis #grid #parallel #power management #robust
- Parallel multigrid preconditioning on graphics processing units (GPUs) for robust power grid analysis (ZF, ZZ), pp. 661–666.
- DAC-2010-WangZD #distributed #logic #parallel #simulation
- Distributed time, conservative parallel logic simulation on GPUs (BDW, YZ, YD), pp. 761–766.
- SIGMOD-2010-KimCSSNKLBD #architecture #named #performance
- FAST: fast architecture sensitive tree search on modern CPUs and GPUs (CK, JC, NS, ES, ADN, TK, VWL, SAB, PD), pp. 339–350.
- SIGMOD-2010-SatishKCNLKD #performance
- Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort (NS, CK, JC, ADN, VWL, DK, PD), pp. 351–362.
- SAC-2010-JiCW #scalability #simulation
- A simulation of large-scale groundwater flow on CUDA-enabled GPUs (XJ, TC, QW), pp. 2402–2403.
- PPoPP-2010-ChoiSV #modelling #multi
- Model-driven autotuning of sparse matrix-vector multiply on GPUs (JC, AS, RWV), pp. 115–126.
- DAC-2009-ChatterjeeDB #simulation
- Event-driven gate-level simulation with GP-GPUs (DC, AD, VB), pp. 557–562.
- CGO-2009-UdupaGT #execution #pipes and filters #source code
- Software Pipelined Execution of Stream Programs on GPUs (AU, RG, MJT), pp. 200–209.
- PPoPP-2009-MaA #compilation #data mining #mining #runtime
- A compiler and runtime system for enabling data mining applications on gpus (WM, GA), pp. 287–288.
- ICPR-2008-GongC #graph #learning #online #optimisation #realtime #segmentation #using
- Real-time foreground segmentation on GPUs using local online learning and global graph cut optimization (MG, LC), pp. 1–4.
- ASPLOS-2006-TarditiPO #named #parallel #using
- Accelerator: using data parallelism to program GPUs for general-purpose uses (DT, SP, JO), pp. 325–335.
- ICDAR-2005-SteinkrauSB #algorithm #machine learning #using
- Using GPUs for Machine Learning Algorithms (DS, PYS, IB), pp. 1115–1119.