BibSLEIGH — dataset stem

Used together with:

larg (48)
base (22)
cluster (21)
mine (19)
evalu (18)

Stem dataset$ (all stems)

205 papers:

DRR-2015-ChenSWLHI #analysis #dataset #documentation #layout: Ground truth model, tool, and dataset for layout analysis of historical documents (KC, MS, HW, ML, JH, RI), p. 940204.
HT-2015-RoutB #algorithm #dataset #ranking #twitter: A Human-annotated Dataset for Evaluating Tweet Ranking Algorithms (DPR, KB), pp. 95–99.
PODS-2015-Cormode #dataset #scalability #summary: Compact Summaries over Large Datasets (GC), pp. 157–158.
VLDB-2015-BhattacherjeeCH #dataset #trade-off #version control: Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff (SB, AC, SH, AD, AGP), pp. 1346–1357.
VLDB-2015-HarbiAKM #dataset #query #rdf: Evaluating SPARQL Queries on Massive RDF Datasets (RH, IA, PK, NM), pp. 1848–1859.
ICSME-2015-JansenH #dataset #industrial #smell #spreadsheet: Code smells in spreadsheet formulas revisited on an industrial dataset (BJ, FH), pp. 372–380.
MSR-2015-AltingerSDW #dataset #embedded #fault #industrial #modelling #novel #predict: A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software (HA, SS, YD, FW), pp. 494–497.
MSR-2015-GermanAH #dataset #git #linux #process: A Dataset of the Activity of the Git Super-repository of Linux in 2012 (DMG, BA, AEH), pp. 470–473.
MSR-2015-HabayebMMBB #dataset #fault: The Firefox Temporal Defect Dataset (MH, AVM, SSM, LB, AB), pp. 498–501.
MSR-2015-KrutzMMRPFS #android #dataset #open source: A Dataset of Open-Source Android Applications (DEK, MM, SAM, AR, JP, AF, JS), pp. 522–525.
MSR-2015-MauczkaBSG #commit #dataset #developer: Dataset of Developer-Labeled Commit Messages (AM, FB, CS, TG), pp. 490–493.
MSR-2015-OhiraKYYMLFHIM #classification #dataset #debugging: A Dataset of High Impact Bugs: Manually-Classified Issue Reports (MO, YK, YY, HY, YM, NL, KF, HH, AI, KiM), pp. 518–521.
MSR-2015-PalombaNTBOPL #dataset #evaluation #named #open data #smell: Landfill: An Open Dataset of Code Smells with Public Evaluation (FP, DDN, MT, GB, RO, DP, ADL), pp. 482–485.
MSR-2015-SawantB #api #dataset: A Dataset for API Usage (AAS, AB), pp. 506–509.
MSR-2015-WermelingerY #architecture #dataset #evolution: An Architectural Evolution Dataset (MW, YY), pp. 502–505.
MSR-2015-Zacchiroli #dataset #metadata #source code: The Debsources Dataset: Two Decades of Debian Source Code Metadata (SZ), pp. 466–469.
SCAM-2015-AivaloglouHH #dataset #scalability #spreadsheet: A grammar for spreadsheet formulas evaluated on two large datasets (EA, DH, FH), pp. 121–130.
CSCW-2015-QuattroneCM #bias #dataset: There’s No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets (GQ, LC, PDM), pp. 1021–1032.
ICEIS-v2-2015-SarinhoLS #dataset #linked data #open data #question: Can You Find All the Data You Expect in a Linked Dataset? (WTS, BFL, DS), pp. 648–655.
ICML-2015-BarbosaENW #dataset #distributed #power of: The Power of Randomization: Distributed Submodular Maximization on Massive Datasets (RdPB, AE, HLN, JW), pp. 1236–1244.
ICML-2015-MaLF #analysis #canonical #correlation #dataset #linear #scalability: Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis (ZM, YL, DPF), pp. 169–178.
KDD-2015-CaoWYR #dataset #online #scalability: Online Outlier Exploration Over Large Datasets (LC, MW, DY, EAR), pp. 89–98.
RecSys-2015-Ben-ShimonTFSRH #challenge #dataset: RecSys Challenge 2015 and the YOOCHOOSE Dataset (DBS, AT, MF, BS, LR, JH), pp. 357–358.
SIGIR-2015-MorenoD #adaptation #dataset #metric #semistructured data: Adapted B-CUBED Metrics to Unbalanced Datasets (JGM, GD), pp. 911–914.
SAC-2015-RochaRCOMVADGF #algorithm #classification #dataset #documentation #named #performance #using: G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
ICSE-v2-2015-HermansM #analysis #dataset #email #spreadsheet: Enron’s Spreadsheets and Related Emails: A Dataset and Analysis (FH, ERMH), pp. 7–16.
DRR-2014-BrunoL #dataset #documentation #open data #recognition #research: The Lehigh Steel Collection: a new open dataset for document recognition research (BB, DPL), p. ?–9.
SIGMOD-2014-SatishSPSPHSYD #dataset #framework #graph #navigation #using: Navigating the maze of graph analytics frameworks using massive graph datasets (NS, NS, MMAP, JS, JP, MAH, SS, ZY, PD), pp. 979–990.
VLDB-2015-MozafariSFJM14 #dataset #learning #scalability: Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning (BM, PS, MJF, MIJ, SM), pp. 125–136.
ICSME-2014-ThongtanunamYYKCFI #bibliography #code review #dataset #named #visualisation: ReDA: A Web-Based Visualization Tool for Analyzing Modern Code Review Dataset (PT, XY, NY, RGK, AECC, KF, HI), pp. 605–608.
MSR-2014-GousiosZ #dataset #development #research: A dataset for pull-based development research (GG, AZ), pp. 368–371.
MSR-2014-LazarRS14a #dataset #debugging #generative: Generating duplicate bug datasets (AL, SR, BS), pp. 392–395.
MSR-2014-MurakamiHK #dataset: A dataset of clone references with gaps (HM, YH, SK), pp. 412–415.
MSR-2014-PassosC #dataset #feature model #kernel #linux: A dataset of feature additions and feature removals from the Linux kernel (LTP, KC), pp. 376–379.
MSR-2014-RoblesRSVG #bibliography #challenge #dataset: FLOSS 2013: a survey dataset about free software contributors: challenges for curating, sharing, and combining (GR, LAR, AS, BV, JMGB), pp. 396–399.
MSR-2014-SainiSOL #dataset #debugging: A dataset for maven artifacts and bug patterns found in them (VS, HS, JO, CVL), pp. 416–419.
MSR-2014-WilliamsRMRK #dataset #modelling: Models of OSS project meta-information: a dataset of three forges (JRW, DDR, NDM, JDR, DSK), pp. 408–411.
MSR-2014-ZhangH #dataset #energy #mining: A green miner’s dataset: mining the impact of software change on energy consumption (CZ, AH), pp. 400–403.
HCI-AIMT-2014-RuffieuxLMK #bibliography #dataset #gesture #recognition: A Survey of Datasets for Human Gesture Recognition (SR, DL, EM, OAK), pp. 337–348.
HIMI-DE-2014-GombosK #dataset #query #recommendation: SPARQL Query Writing with Recommendations Based on Datasets (GG, AK), pp. 310–319.
ICEIS-v1-2014-TimoteoVF #analysis #case study #dataset #network #project management: Evaluating Artificial Neural Networks and Traditional Approaches for Risk Analysis in Software Project Management — A Case Study with PERIL Dataset (CT, MV, SF), pp. 472–479.
ECIR-2014-BelloginSVS #challenge #dataset #evaluation #web: Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track (AB, TS, APdV, AS), pp. 430–436.
ECIR-2014-CarageaWCWRCWG #big data #dataset: CiteSeer x : A Scholarly Big Dataset (CC, JW, AMC, KW, JPFR, HHC, ZW, CLG), pp. 311–322.
ICPR-2014-Calvo-ZaragozaO #dataset #music #recognition: Recognition of Pen-Based Music Notation: The HOMUS Dataset (JCZ, JO), pp. 3038–3043.
ICPR-2014-FletcherI #dataset #evaluation #quality: Quality Evaluation of an Anonymized Dataset (SF, MZI), pp. 3594–3599.
ICPR-2014-SandhanC #dataset #hybrid #pattern matching #pattern recognition #recognition: Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition (TS, JYC), pp. 1449–1453.
ICPR-2014-WangS #automation #dataset #multi #segmentation #using: Automatic Multi-organ Segmentation in Non-enhanced CT Datasets Using Hierarchical Shape Priors (CW, ÖS), pp. 3327–3332.
MLDM-2014-JavedA #classification #dataset #network #social #using: Creation of Bi-lingual Social Network Dataset Using Classifiers (IJ, HA), pp. 523–533.
MLDM-2014-WaiyamaiS #approach #classification #dataset: A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets (KW, PS), pp. 31–42.
HPDC-2014-SuAWBS #analysis #correlation #dataset #distributed #parallel: Supporting correlation analysis on scientific datasets in parallel and distributed settings (YS, GA, JW, AB, HWS), pp. 191–202.
DATE-2013-StergiouJ #dataset #optimisation: Optimizing BDDs for time-series dataset manipulation (SS, JJ), pp. 1018–1021.
ICDAR-2013-ShivramRSG #dataset #named: IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset (AS, CR, SS, VG), pp. 13–17.
MSR-2013-BinkleyLPHV #dataset #identifier: A dataset for evaluating identifier splitters (DB, DL, LLP, EH, KVS), pp. 401–404.
MSR-2013-ButlerWYS #dataset #identifier #named: INVocD: identifier name vocabulary dataset (SB, MW, YY, HS), pp. 405–408.
MSR-2013-DitHPK #dataset #evaluation #maintenance: A dataset from change history to support evaluation of software maintenance tasks (BD, AH, DP, HHK), pp. 131–134.
MSR-2013-GoeminneCM #dataset #ecosystem #gnome: A historical dataset for the gnome ecosystem (MG, MC, TM), pp. 225–228.
MSR-2013-Gousios #dataset: The GHTorent dataset and tool suite (GG), pp. 233–236.
MSR-2013-HamasakiKYCFI #bibliography #code review #dataset #repository #what: Who does what during a code review? datasets of OSS peer review repositories (KH, RGK, NY, AECC, KF, HI), pp. 49–52.
MSR-2013-JanjicHSA #dataset #research #reuse #source code: An unabridged source code dataset for research in software reuse (WJ, OH, MS, CA), pp. 339–342.
MSR-2013-LamkanfiPD #dataset #debugging #eclipse #fault #mining: The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information (AL, JP, SD), pp. 203–206.
MSR-2013-MacLeanK #commit #dataset #network #social: Apache commits: social network dataset (ACM, CDK), pp. 135–138.
MSR-2013-RaemaekersDV #dataset #dependence #metric #repository: The maven repository dataset of metrics, changes, and dependencies (SR, AvD, JV), pp. 221–224.
MSR-2013-Squire #dataset: Project roles in the apache software foundation: a dataset (MS), pp. 301–304.
MSR-2013-Squire13a #dataset #twitter: Apache-affiliated twitter screen names: a dataset (MS), pp. 305–308.
MSR-2013-VasilescuSM #dataset #re-engineering: A historical dataset of software engineering conferences (BV, AS, TM), pp. 373–376.
MSR-2013-WagstromJS #dataset #graph #network #ruby: A network of rails: a graph dataset of ruby on rails and associated projects (PW, CJ, AS), pp. 229–232.
CSCW-2013-RostBCB #challenge #communication #dataset #representation #scalability #social #social media: Representation and communication: challenges in interpreting large social media datasets (MR, LB, HC, BB), pp. 357–362.
CIKM-2013-GilpinQD #clustering #dataset #performance #scalability: Efficient hierarchical clustering of large high dimensional datasets (SG, BQ, ID), pp. 1371–1380.
MLDM-2013-AllahSG #algorithm #array #dataset #mining #performance #scalability: An Efficient and Scalable Algorithm for Mining Maximal — High Confidence Rules from Microarray Dataset (WZAA, YKES, FFMG), pp. 352–366.
MLDM-2013-ParraL #clustering #dataset #using: Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering (FLP, EL), pp. 130–143.
RecSys-2013-Aiolli #dataset #performance #recommendation #scalability: Efficient top-n recommendation for very large scale binary rated datasets (FA), pp. 273–280.
SAC-2013-HapfelmeierSK #dataset #incremental #linear #performance: Incremental linear model trees on massive datasets: keep it simple, keep it fast (AH, JS, SK), pp. 129–135.
HPDC-2013-SuAWMWA #dataset #distributed #using: Taming massive distributed datasets: data sampling using bitmap indices (YS, GA, JW, KM, JW, JPA), pp. 13–24.
HPDC-2013-YinLBGN #dataset #order #performance #pipes and filters #using: Efficient analytics on ordered datasets using MapReduce (JY, YL, MB, LG, AN), pp. 125–126.
DRR-2012-WalkerLR #dataset #documentation #image: A synthetic document image dataset for developing and evaluating historical document processing methods (DDW, WBL, EKR).
VLDB-2012-Shirani-MehrKS #dataset #evaluation #performance #query #reachability #scalability: Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets (HSM, FBK, CS), pp. 848–859.
CHI-2012-FisherPDs #dataset #incremental #performance #scalability #trust #visualisation: Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster (DF, IOP, SMD, MMCS), pp. 1673–1682.
ICPR-2012-BoomHHF #clustering #dataset #image #using: Supporting ground-truth annotation of image datasets using clustering (BJB, PXH, JH, RBF), pp. 1542–1545.
ICPR-2012-ChenWY #clustering #dataset #graph: Centroid-based clustering for graph datasets (LC, SW, XY), pp. 2144–2147.
ICPR-2012-FausserS #clustering #dataset #kernel #scalability: Clustering large datasets with kernel methods (SF, FS), pp. 501–504.
ICPR-2012-MogelmoseTM #comparative #dataset #detection #evaluation #learning: Learning to detect traffic signs: Comparative evaluation of synthetic and real-world datasets (AM, MMT, TBM), pp. 3452–3455.
ICPR-2012-NafchiK #dataset #image #representation: Rectangular based binary image representation: Theory, applications, and dataset introduction (HZN, HRK), pp. 190–193.
ICPR-2012-TakalaP #dataset #identification #named #network #people: CMV100: A dataset for people tracking and re-identification in sparse camera networks (VT, MP), pp. 1387–1390.
ICPR-2012-TanLZ #dataset: The dataset system of Economic Dispute handwritten (DSEDH) based on stroke shape and structure features (JT, JHL, XXZ), pp. 661–664.
ICPR-2012-Utasi #classification #dataset: Weighted conditional mutual information based boosting for classification of imbalanced datasets (ÁU), pp. 2711–2714.
KDD-2012-ShiA #dataset #mobile #recommendation: GetJar mobile application recommendations with very sparse datasets (KS, KA), pp. 204–212.
KDIR-2012-KharbatBO #algorithm #case study #dataset: A New Compaction Algorithm for LCS Rules — Breast Cancer Dataset Case Study (FK, LB, MO), pp. 382–385.
KDIR-2012-Vanetik #classification #dataset: Classification of Datasets with Frequent Itemsets is Wild (NV), pp. 386–389.
MLDM-2012-JoutsijokiJ #case study #dataset: DAGSVM vs. DAGKNN: An Experimental Case Study with Benthic Macroinvertebrate Dataset (HJ, MJ), pp. 439–453.
SIGIR-2012-HuO #classification #dataset #using: Genre classification for million song dataset using confidence-based classifiers combination (YH, MO), pp. 1083–1084.
SAC-2012-GomiI #3d #dataset #image #mobile #multi #named: MINI: a 3D mobile image browser with multi-dimensional datasets (AG, TI), pp. 989–996.
HPDC-2012-HefeedaGA #approximate #clustering #dataset #distributed #scalability: Distributed approximate spectral clustering for large-scale datasets (MH, FG, WAA), pp. 223–234.
ICDAR-2011-AlaeiNP #benchmark #dataset #documentation #metric #segmentation: A Benchmark Kannada Handwritten Document Dataset and Its Segmentation (AA, PN, UP), pp. 141–145.
ICDAR-2011-QuiniouMSVMPM #dataset #named: HAMEX — A Handwritten and Audio Dataset of Mathematical Expressions (SQ, HM, SPS, CVG, EM, SP, SM), pp. 452–456.
SIGMOD-2011-DuanKSU #benchmark #comparison #dataset #metric #rdf: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets (SD, AK, KS, OU), pp. 145–156.
SIGMOD-2011-KashyapP #agile #dataset #development #interface #query #xml: Rapid development of web-based query interfacesfor XML datasets with QURSED (AK, MP), pp. 1339–1342.
VLDB-2012-BarskyKWH11 #correlation #dataset #mining #scalability #taxonomy: Mining Flipping Correlations from Large Datasets with Taxonomies (MB, SK, TW, JH), pp. 370–381.
OCSC-2011-AliprandiRMTM #dataset #rdf #semantics #web #wiki: Extracting Events from Wikipedia as RDF Triples Linked to Widespread Semantic Web Datasets (CA, FR, AM, MT, SM), pp. 90–99.
CIKM-2011-CachedaCFF #algorithm #analysis #dataset #nearest neighbour: Improving k-nearest neighbors algorithms: practical application of dataset analysis (FC, VC, DF, VF), pp. 2253–2256.
CIKM-2011-LiuYS #dataset #query: Subject-oriented top-k hot region queries in spatial dataset (JL, GY, HS), pp. 2409–2412.
CIKM-2011-SelvarajBSS #classification #dataset: Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset (SKS, BB, SS, SKS), pp. 653–662.
KDD-2011-CordeiroTTLKF #clustering #dataset #multi #pipes and filters #scalability: Clustering very large multi-dimensional datasets with MapReduce (RLFC, CTJ, AJMT, JL, UK, CF), pp. 690–698.
SIGIR-2011-LeeHWHS #dataset #graph #image #learning #multi #pipes and filters #scalability #using: Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce (WYL, LCH, GLW, WHH, YFS), pp. 1121–1122.
SIGMOD-2010-WangWLWWLTXL #dataset #detection #named: MapDupReducer: detecting near duplicates over massive datasets (CW, JW, XL, WW, HW, HL, WT, JX, RL), pp. 1119–1122.
VLDB-2010-LiD #dataset #probability #ranking: Ranking Continuous Probabilistic Datasets (JL, AD), pp. 638–649.
VLDB-2010-Matsudaira #3d #biology #dataset #scalability: High-End Biological Imaging Generates Very Large 3D+ and Dynamic Datasets (PM), p. 3.
VLDB-2010-MelnikGLRSTV #analysis #dataset #interactive #named: Dremel: Interactive Analysis of Web-Scale Datasets (SM, AG, JJL, GR, SS, MT, TV), pp. 330–339.
VLDB-2010-ParameswaranGR #concept #dataset #scalability #towards #web: Towards The Web of Concepts: Extracting Concepts from Large Datasets (AGP, HGM, AR), pp. 566–577.
MSR-2010-BachmannB #correlation #dataset #debugging #process #quality #re-engineering: When process data quality affects the number of bugs: Correlations in software engineering datasets (AB, AB), pp. 62–71.
WCRE-2010-NguyenAH #bias #case study #dataset #debugging: A Case Study of Bias in Bug-Fix Datasets (THDN, BA, AEH), pp. 259–268.
PLDI-2010-ChenHEFPTW #dataset #optimisation: Evaluating iterative optimization across 1000 datasets (YC, YH, LE, GF, LP, OT, CW), pp. 448–459.
STOC-2010-BravermanO #dataset #independence: Measuring independence of datasets (VB, RO), pp. 271–280.
CIKM-2010-CormodeKW #algorithm #dataset #scalability #set: Set cover algorithms for very large datasets (GC, HJK, AW), pp. 479–488.
CIKM-2010-HarpaleYGHY #dataset #multi #named #performance #personalisation: CiteData: a new multi-faceted dataset for evaluating personalized search performance (AH, YY, SG, DH, ZY), pp. 549–558.
ICML-2010-SyedR #dataset #identification: Unsupervised Risk Stratification in Clinical Datasets: Identifying Patients at Risk of Rare Outcomes (ZS, IR), pp. 1023–1030.
ICML-2010-TanWT #dataset #feature model #learning: Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets (MT, LW, IWT), pp. 1047–1054.
ICPR-2010-AlvarezSVO #dataset: Perceptual Color Texture Codebooks for Retrieving in Highly Diverse Texture Datasets (SÁ, AS, MV, XO), pp. 866–869.
ICPR-2010-KimM #classification #dataset: Dense Structure Inference for Object Classification in Aerial LIDAR Dataset (EK, GGM), pp. 3049–3052.
ICPR-2010-PapaCF #classification #dataset #optimisation: Optimizing Optimum-Path Forest Classification for Huge Datasets (JPP, FAMC, AXF), pp. 4162–4165.
ICPR-2010-SodaI #composition #dataset #integration #learning: Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration (PS, GI), pp. 3117–3120.
ICPR-2010-StottingerZKH #dataset #evaluation: FeEval A Dataset for Evaluation of Spatio-temporal Local Features (JS, SZ, RK, AH), pp. 499–502.
RecSys-2010-PilaszyZT #dataset #feedback #matrix #performance: Fast als-based matrix factorization for explicit and implicit feedback datasets (IP, DZ, DT), pp. 71–78.
SAC-2010-BaralisCC #dataset #mining #persistent #scalability: A persistent HY-Tree to efficiently support itemset mining on large datasets (EB, TC, SC), pp. 1060–1064.
SAC-2010-LuccheseOP #dataset #generative #mining: A generative pattern model for mining binary datasets (CL, SO, RP), pp. 1109–1110.
ICDAR-2009-AntonacopoulosBPP #analysis #dataset #documentation #evaluation #layout #performance: A Realistic Dataset for Performance Evaluation of Document Layout Analysis (AA, DB, CP, SP), pp. 296–300.
CHI-2009-LeeSRCT #dataset #named #roadmap: FacetLens: exposing trends and relationships to support sensemaking within faceted datasets (BL, GS, GGR, MC, DST), pp. 1293–1302.
CIKM-2009-BalachandranPK #clustering #configuration management #dataset #documentation: Interpretable and reconfigurable clustering of document datasets by deriving word-based rules (VB, DP, DK), pp. 1773–1776.
CIKM-2009-Muntes-MuleroN #dataset #privacy #scalability: Privacy and anonymization for very large datasets (VMM, JN), pp. 2117–2118.
CIKM-2009-StoyanovichA #clustering #dataset: Rank-aware clustering of structured datasets (JS, SAY), pp. 1429–1432.
KDD-2009-DundarHBRR #case study #dataset #detection #learning #using: Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology (MD, EDH, AKB, JPR, BR), pp. 279–288.
MLDM-2009-CelepcikayEO #dataset #using: Regional Pattern Discovery in Geo-referenced Datasets Using PCA (OUC, CFE, CO), pp. 719–733.
MLDM-2009-SegataB #dataset #performance #scalability: Fast Local Support Vector Machines for Large Datasets (NS, EB), pp. 295–310.
SAC-2009-OwensMR #dataset #mining: Capturing truthiness: mining truth tables in binary datasets (CCOI, TMM, NR), pp. 1467–1474.
ESEC-FSE-2009-BirdBADBFD #bias #dataset #debugging: Fair and balanced?: bias in bug-fix datasets (CB, AB, EA, JD, AB, VF, PTD), pp. 121–130.
SIGMOD-2008-PanZW #clustering #composition #dataset #matrix #named #performance #scalability: CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition (FP, XZ, WW), pp. 173–184.
ICEIS-DISI-2008-MuldnerML #data access #dataset #policy #xml: Succinct Access Control Policies for Published XML Datasets (TM, JKM, GL), pp. 380–385.
CIKM-2008-ChoudharyMB #dataset #evolution #on the: On quantifying changes in temporally evolving dataset (RC, SM, AB), pp. 1459–1460.
CIKM-2008-LiuLNBMG #dataset #feature model #performance #preprocessor #realtime #scalability: Real-time data pre-processing technique for efficient feature extraction in large scale datasets (YL, LVL, RSN, KB, PM, CLG), pp. 981–990.
CIKM-2008-NguyenS #analysis #correlation #dataset #performance: Fast correlation analysis on time series datasets (PN, NS), pp. 787–796.
CIKM-2008-ShawXG #approximate #dataset: Deriving non-redundant approximate association rules from hierarchical datasets (GS, YX, SG), pp. 1451–1452.
ICML-2008-WolfeHK #dataset #distributed #scalability: Fully distributed EM for very large datasets (JW, AH, DK), pp. 1184–1191.
ICPR-2008-WatanabeK08a #dataset #scalability: RANSAC-SVM for large-scale datasets (KW, TK), pp. 1–4.
KDD-2008-DasSN #category theory #dataset #detection: Anomaly pattern detection in categorical datasets (KD, JGS, DBN), pp. 169–176.
FSE-2008-OsterweilCEPWBH #dataset #experience #process #using #workflow: Experience in using a process language to define scientific workflow and generate dataset provenance (LJO, LAC, AME, RMP, AEW, ERB, JLH), pp. 319–329.
SIGMOD-2007-XiaoT #dataset #named #privacy #towards: M-invariance: towards privacy preserving re-publication of dynamic datasets (XX, YT), pp. 689–700.
HIMI-IIE-2007-CollinsNMRBCPFHP #collaboration #dataset: The Karst Collaborative Workspace for Analyzing and Annotating Scientific Datasets (LMC, DEN, MLBM, JVR, MAB, CRC, JEP, BFS, SKH, JCP), pp. 3–12.
KDD-2007-DasS #category theory #dataset #detection: Detecting anomalous records in categorical datasets (KD, JGS), pp. 220–229.
PADL-2007-Costa #dataset #performance #prolog: Prolog Performance on Larger Datasets (VSC), pp. 185–199.
DRR-2006-ZhangA #dataset #towards: Toward quantifying the amount of style in a dataset (XZ, SA).
SIGMOD-2006-KiferG #dataset #injection: Injecting utility into anonymized datasets (DK, JG), pp. 217–228.
VLDB-2006-GemullaLH #dataset #evolution #maintenance: A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets (RG, WL, PJH), pp. 595–606.
VLDB-2006-JiTT #3d #dataset #mining: Mining Frequent Closed Cubes in 3D Datasets (LJ, KLT, AKHT), pp. 811–822.
ICPR-v1-2006-AnC #dataset: Finding Rule Groups to Classify High Dimensional Gene Expression Datasets (JA, YPPC), pp. 1196–1199.
ICPR-v3-2006-FelsbergG #dataset #multi #named #robust #scalability: P-Channels: Robust Multivariate M-Estimation of Large Datasets (MF, GHG), pp. 262–267.
ICPR-v4-2006-Jun #comparison #dataset #detection: A Peer Dataset Comparison Outlier Detection Model Applied to Financial Surveillance (TJ), pp. 900–903.
SAC-2006-FangG #bound #dataset: Boundary surface extraction and rendering for volume datasets (SF, PG), pp. 1356–1360.
SAC-2006-NemalhabibS #algorithm #category theory #clustering #dataset #named: CLUC: a natural clustering algorithm for categorical datasets based on cohesion (AN, NS), pp. 637–638.
ICDAR-2005-AgrawalBMV #dataset #named #online #representation #xml: UPX: A New XML Representation for Annotated Datasets of Online Handwriting Data (MA, KB, SM, LV), pp. 1161–1165.
ICEIS-v2-2005-DoP #dataset #mining #scalability #visualisation: Mining Very Large Datasets with SVM and Visualization (TND, FP), pp. 127–141.
KDD-2005-JinWPPA #dataset #graph: Discovering frequent topological structures from graph datasets (RJ, CW, DP, SP, GA), pp. 606–611.
KDD-2005-ZakiPAS #algorithm #category theory #clustering #dataset #effectiveness #mining #named: CLICKS: an effective algorithm for mining subspace clusters in categorical datasets (MJZ, MP, IA, TS), pp. 736–742.
MLDM-2005-FerrandizB05a #dataset #evaluation: Supervised Evaluation of Dataset Partitions: Advantages and Practice (SF, MB), pp. 600–609.
MLDM-2005-SiaL #clustering #dataset #scalability #using: Clustering Large Dynamic Datasets Using Exemplar Points (WS, MML), pp. 163–173.
SIGMOD-2004-CongXPTY #array #dataset #named: FARMER: Finding Interesting Rule Groups in Microarray Datasets (GC, AKHT, XX, FP, JY), pp. 143–154.
VLDB-2004-HoweM #algebra #dataset: Algebraic Manipulation of Scientific Datasets (BH, DM), pp. 924–935.
CIKM-2004-ChenL #clustering #dataset #named #scalability #visualisation: ClusterMap: labeling clusters in large datasets via visualization (KC, LL), pp. 285–293.
CIKM-2004-ChungJM #clustering #dataset #mining #using: Mining gene expression datasets using density-based clustering (SC, JJ, DM), pp. 150–151.
ICPR-v4-2004-WillisC #3d #dataset #multi #symmetry: Alignment of Multiple Non-Overlapping Axially Symmetric 3D Datasets (ARW, DBC), pp. 96–99.
KDD-2004-TruongLB #dataset #learning #random #using: Learning a complex metabolomic dataset using random forests and support vector machines (YT, XL, CB), pp. 835–840.
SIGIR-2004-DavidovGM #categorisation #dataset #generative: Parameterized generation of labeled datasets for text categorization based on a hierarchical directory (DD, EG, SM), pp. 250–257.
SAC-2004-AdamJA #dataset #detection: Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets (NRA, VPJ, VA), pp. 576–583.
SAC-2004-CarswellGN #dataset #multi #semantics #transaction: Wireless spatio-semantic transactions on multimedia datasets (JDC, KG, MN), pp. 1201–1205.
DRR-2003-HauserSSDST #dataset: Correcting OCR text by association with historical datasets (SEH, JS, TFS, DDF, SS, GRT), pp. 84–93.
VLDB-2003-LinLYZ #dataset #multi #scalability: Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets (XL, QL, YY, XZ), pp. 814–825.
ICEIS-v2-2003-DoP #algorithm #dataset #mining #scalability: Mining Very Large Datasets with Support Vector Machine Algorithms (TND, FP), pp. 140–147.
ICML-2003-LeskovecS #dataset #linear #programming: Linear Programming Boosting for Uneven Datasets (JL, JST), pp. 456–463.
ICML-2003-ZhuWC #dataset #scalability: Eliminating Class Noise in Large Datasets (XZ, XW, QC), pp. 920–927.
KDD-2003-El-HajjZ #dataset #interactive #matrix #mining #performance #scalability: Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining (MEH, ORZ), pp. 109–118.
KDD-2003-KoyuturkG #dataset #framework #named: PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets (MK, AG), pp. 147–156.
KDD-2003-PanCTYZ #biology #dataset #named: Carpenter: finding closed patterns in long biological datasets (FP, GC, AKHT, JY, MJZ), pp. 637–642.
KDD-2003-PeterCG #algorithm #clustering #dataset #scalability: New unsupervised clustering algorithm for large datasets (WP, JC, CG), pp. 643–648.
MLDM-2003-LeleuRBE #dataset #mining #named: GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions (ML, CR, JFB, GE), pp. 293–306.
SAC-2003-ChenL #clustering #dataset #visualisation: Cluster Rendering of Skewed Datasets via Visualization (KC, LL), pp. 909–916.
CIKM-2002-ZhaoK #algorithm #clustering #dataset #documentation #evaluation: Evaluation of hierarchical clustering algorithms for document datasets (YZ, GK), pp. 515–524.
KDD-2002-RidgewayM #analysis #dataset: Bayesian analysis of massive datasets via particle filters (GR, DM), pp. 5–13.
KDD-2002-TantrumMS #clustering #dataset #modelling #scalability: Hierarchical model-based clustering of large datasets through fractionation and refractionation (JT, AM, WS), pp. 183–190.
ICEIS-v1-2001-KotsisWFM #dataset #multi #novel #visualisation: Novel Data Visualisation and Exploration in Multidimensional Datasets (NK, GRSW, JDF, DRM), pp. 170–175.
ICML-2001-DomeniconiG #approach #approximate #classification #dataset #multi #nearest neighbour #performance #query #scalability: An Efficient Approach for Approximating Multi-dimensional Range Queries and Nearest Neighbor Classification in Large Datasets (CD, DG), pp. 98–105.
KDD-2001-BeygelzimerPM #category theory #dataset #performance #scalability #visualisation: Fast ordering of large categorical datasets for better visualization (AB, CSP, SM), pp. 239–244.
KDD-2000-BarbaraC #clustering #dataset #using: Using the fractal dimension to cluster datasets (DB, PC), pp. 260–264.
KDD-2000-Yang #3d #dataset #interactive #relational #scalability: Interactive exploration of very large relational datasets through 3D dynamic projections (LY), pp. 236–243.
KDD-2000-ZhangDR #constraints #dataset #scalability: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets (XZ, GD, KR), pp. 310–314.
SIGMOD-1999-MankuRL #dataset #online #order #performance #random #scalability #statistics: Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets (GSM, SR, BGL), pp. 251–262.
KDD-1999-DaviesM #dataset #network: Bayesian Networks for Lossless Dataset Compression (SD, AWM), pp. 387–391.
VLDB-1998-GehrkeRG #dataset #framework #named #performance #scalability: RainForest — A Framework for Fast Decision Tree Construction of Large Datasets (JG, RR, VG), pp. 416–427.
VLDB-1998-KnorrN #algorithm #dataset #mining #scalability: Algorithms for Mining Distance-Based Outliers in Large Datasets (EMK, RTN), pp. 392–403.
VLDB-1998-ShuklaDN #dataset #multi: Materialized View Selection for Multidimensional Datasets (AS, PD, JFN), pp. 488–499.
KDD-1998-AlsabtiRS #classification #dataset #named #scalability: CLOUDS: A Decision Tree Classifier for Large Datasets (KA, SR, VS), pp. 2–8.
KDD-1998-OatesJ #dataset #modelling #scalability: Large Datasets Lead to Overly Complex Models: An Explanation and a Solution (TO, DJ), pp. 294–298.
SIGMOD-1997-KornJF #ad hoc #dataset #query #scalability #sequence: Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences (FK, HVJ, CF), pp. 289–300.
SIGMOD-1997-LivnyRBCDLMW #dataset #named #query #scalability #visualisation: DEVise: Integrated Querying and Visualization of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 301–312.
SIGMOD-1997-LivnyRBCDLMW97a #dataset #named #query #scalability #visual notation: DEVise: Integrated Querying and Visual Exploration of Large Datasets (Demo Abstract) (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 517–520.
KDD-1997-ZupanBBC #approach #composition #data mining #dataset #mining: A Dataset Decomposition Approach to Data Mining and Machine Discovery (BZ, MB, IB, BC), pp. 299–302.
SIGMOD-1995-FaloutsosL #algorithm #dataset #multi #named #performance #visualisation: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets (CF, KIL), pp. 163–174.
KDD-1995-StolorzNMMSSYNCMF #data mining #dataset #mining #performance #scalability: Fast Spatio-Temporal Data Mining of Large Geophysical Datasets (PES, HN, EM, RRM, ECS, JRS, JY, KWN, SYC, CRM, JDF), pp. 300–305.