205 papers:
- DRR-2015-ChenSWLHI #analysis #dataset #documentation #layout
- Ground truth model, tool, and dataset for layout analysis of historical documents (KC, MS, HW, ML, JH, RI), p. 940204.
- HT-2015-RoutB #algorithm #dataset #ranking #twitter
- A Human-annotated Dataset for Evaluating Tweet Ranking Algorithms (DPR, KB), pp. 95–99.
- PODS-2015-Cormode #dataset #scalability #summary
- Compact Summaries over Large Datasets (GC), pp. 157–158.
- VLDB-2015-BhattacherjeeCH #dataset #trade-off #version control
- Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff (SB, AC, SH, AD, AGP), pp. 1346–1357.
- VLDB-2015-HarbiAKM #dataset #query #rdf
- Evaluating SPARQL Queries on Massive RDF Datasets (RH, IA, PK, NM), pp. 1848–1859.
- ICSME-2015-JansenH #dataset #industrial #smell #spreadsheet
- Code smells in spreadsheet formulas revisited on an industrial dataset (BJ, FH), pp. 372–380.
- MSR-2015-AltingerSDW #dataset #embedded #fault #industrial #modelling #novel #predict
- A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software (HA, SS, YD, FW), pp. 494–497.
- MSR-2015-GermanAH #dataset #git #linux #process
- A Dataset of the Activity of the Git Super-repository of Linux in 2012 (DMG, BA, AEH), pp. 470–473.
- MSR-2015-HabayebMMBB #dataset #fault
- The Firefox Temporal Defect Dataset (MH, AVM, SSM, LB, AB), pp. 498–501.
- MSR-2015-KrutzMMRPFS #android #dataset #open source
- A Dataset of Open-Source Android Applications (DEK, MM, SAM, AR, JP, AF, JS), pp. 522–525.
- MSR-2015-MauczkaBSG #commit #dataset #developer
- Dataset of Developer-Labeled Commit Messages (AM, FB, CS, TG), pp. 490–493.
- MSR-2015-OhiraKYYMLFHIM #classification #dataset #debugging
- A Dataset of High Impact Bugs: Manually-Classified Issue Reports (MO, YK, YY, HY, YM, NL, KF, HH, AI, KiM), pp. 518–521.
- MSR-2015-PalombaNTBOPL #dataset #evaluation #named #open data #smell
- Landfill: An Open Dataset of Code Smells with Public Evaluation (FP, DDN, MT, GB, RO, DP, ADL), pp. 482–485.
- MSR-2015-SawantB #api #dataset
- A Dataset for API Usage (AAS, AB), pp. 506–509.
- MSR-2015-WermelingerY #architecture #dataset #evolution
- An Architectural Evolution Dataset (MW, YY), pp. 502–505.
- MSR-2015-Zacchiroli #dataset #metadata #source code
- The Debsources Dataset: Two Decades of Debian Source Code Metadata (SZ), pp. 466–469.
- SCAM-2015-AivaloglouHH #dataset #scalability #spreadsheet
- A grammar for spreadsheet formulas evaluated on two large datasets (EA, DH, FH), pp. 121–130.
- CSCW-2015-QuattroneCM #bias #dataset
- There’s No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets (GQ, LC, PDM), pp. 1021–1032.
- ICEIS-v2-2015-SarinhoLS #dataset #linked data #open data #question
- Can You Find All the Data You Expect in a Linked Dataset? (WTS, BFL, DS), pp. 648–655.
- ICML-2015-BarbosaENW #dataset #distributed #power of
- The Power of Randomization: Distributed Submodular Maximization on Massive Datasets (RdPB, AE, HLN, JW), pp. 1236–1244.
- ICML-2015-MaLF #analysis #canonical #correlation #dataset #linear #scalability
- Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis (ZM, YL, DPF), pp. 169–178.
- KDD-2015-CaoWYR #dataset #online #scalability
- Online Outlier Exploration Over Large Datasets (LC, MW, DY, EAR), pp. 89–98.
- RecSys-2015-Ben-ShimonTFSRH #challenge #dataset
- RecSys Challenge 2015 and the YOOCHOOSE Dataset (DBS, AT, MF, BS, LR, JH), pp. 357–358.
- SIGIR-2015-MorenoD #adaptation #dataset #metric #semistructured data
- Adapted B-CUBED Metrics to Unbalanced Datasets (JGM, GD), pp. 911–914.
- SAC-2015-RochaRCOMVADGF #algorithm #classification #dataset #documentation #named #performance #using
- G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
- ICSE-v2-2015-HermansM #analysis #dataset #email #spreadsheet
- Enron’s Spreadsheets and Related Emails: A Dataset and Analysis (FH, ERMH), pp. 7–16.
- DRR-2014-BrunoL #dataset #documentation #open data #recognition #research
- The Lehigh Steel Collection: a new open dataset for document recognition research (BB, DPL), p. ?–9.
- SIGMOD-2014-SatishSPSPHSYD #dataset #framework #graph #navigation #using
- Navigating the maze of graph analytics frameworks using massive graph datasets (NS, NS, MMAP, JS, JP, MAH, SS, ZY, PD), pp. 979–990.
- VLDB-2015-MozafariSFJM14 #dataset #learning #scalability
- Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning (BM, PS, MJF, MIJ, SM), pp. 125–136.
- ICSME-2014-ThongtanunamYYKCFI #bibliography #code review #dataset #named #visualisation
- ReDA: A Web-Based Visualization Tool for Analyzing Modern Code Review Dataset (PT, XY, NY, RGK, AECC, KF, HI), pp. 605–608.
- MSR-2014-GousiosZ #dataset #development #research
- A dataset for pull-based development research (GG, AZ), pp. 368–371.
- MSR-2014-LazarRS14a #dataset #debugging #generative
- Generating duplicate bug datasets (AL, SR, BS), pp. 392–395.
- MSR-2014-MurakamiHK #dataset
- A dataset of clone references with gaps (HM, YH, SK), pp. 412–415.
- MSR-2014-PassosC #dataset #feature model #kernel #linux
- A dataset of feature additions and feature removals from the Linux kernel (LTP, KC), pp. 376–379.
- MSR-2014-RoblesRSVG #bibliography #challenge #dataset
- FLOSS 2013: a survey dataset about free software contributors: challenges for curating, sharing, and combining (GR, LAR, AS, BV, JMGB), pp. 396–399.
- MSR-2014-SainiSOL #dataset #debugging
- A dataset for maven artifacts and bug patterns found in them (VS, HS, JO, CVL), pp. 416–419.
- MSR-2014-WilliamsRMRK #dataset #modelling
- Models of OSS project meta-information: a dataset of three forges (JRW, DDR, NDM, JDR, DSK), pp. 408–411.
- MSR-2014-ZhangH #dataset #energy #mining
- A green miner’s dataset: mining the impact of software change on energy consumption (CZ, AH), pp. 400–403.
- HCI-AIMT-2014-RuffieuxLMK #bibliography #dataset #gesture #recognition
- A Survey of Datasets for Human Gesture Recognition (SR, DL, EM, OAK), pp. 337–348.
- HIMI-DE-2014-GombosK #dataset #query #recommendation
- SPARQL Query Writing with Recommendations Based on Datasets (GG, AK), pp. 310–319.
- ICEIS-v1-2014-TimoteoVF #analysis #case study #dataset #network #project management
- Evaluating Artificial Neural Networks and Traditional Approaches for Risk Analysis in Software Project Management — A Case Study with PERIL Dataset (CT, MV, SF), pp. 472–479.
- ECIR-2014-BelloginSVS #challenge #dataset #evaluation #web
- Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track (AB, TS, APdV, AS), pp. 430–436.
- ECIR-2014-CarageaWCWRCWG #big data #dataset
- CiteSeer x : A Scholarly Big Dataset (CC, JW, AMC, KW, JPFR, HHC, ZW, CLG), pp. 311–322.
- ICPR-2014-Calvo-ZaragozaO #dataset #music #recognition
- Recognition of Pen-Based Music Notation: The HOMUS Dataset (JCZ, JO), pp. 3038–3043.
- ICPR-2014-FletcherI #dataset #evaluation #quality
- Quality Evaluation of an Anonymized Dataset (SF, MZI), pp. 3594–3599.
- ICPR-2014-SandhanC #dataset #hybrid #pattern matching #pattern recognition #recognition
- Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition (TS, JYC), pp. 1449–1453.
- ICPR-2014-WangS #automation #dataset #multi #segmentation #using
- Automatic Multi-organ Segmentation in Non-enhanced CT Datasets Using Hierarchical Shape Priors (CW, ÖS), pp. 3327–3332.
- MLDM-2014-JavedA #classification #dataset #network #social #using
- Creation of Bi-lingual Social Network Dataset Using Classifiers (IJ, HA), pp. 523–533.
- MLDM-2014-WaiyamaiS #approach #classification #dataset
- A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets (KW, PS), pp. 31–42.
- HPDC-2014-SuAWBS #analysis #correlation #dataset #distributed #parallel
- Supporting correlation analysis on scientific datasets in parallel and distributed settings (YS, GA, JW, AB, HWS), pp. 191–202.
- DATE-2013-StergiouJ #dataset #optimisation
- Optimizing BDDs for time-series dataset manipulation (SS, JJ), pp. 1018–1021.
- ICDAR-2013-ShivramRSG #dataset #named
- IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset (AS, CR, SS, VG), pp. 13–17.
- MSR-2013-BinkleyLPHV #dataset #identifier
- A dataset for evaluating identifier splitters (DB, DL, LLP, EH, KVS), pp. 401–404.
- MSR-2013-ButlerWYS #dataset #identifier #named
- INVocD: identifier name vocabulary dataset (SB, MW, YY, HS), pp. 405–408.
- MSR-2013-DitHPK #dataset #evaluation #maintenance
- A dataset from change history to support evaluation of software maintenance tasks (BD, AH, DP, HHK), pp. 131–134.
- MSR-2013-GoeminneCM #dataset #ecosystem #gnome
- A historical dataset for the gnome ecosystem (MG, MC, TM), pp. 225–228.
- MSR-2013-Gousios #dataset
- The GHTorent dataset and tool suite (GG), pp. 233–236.
- MSR-2013-HamasakiKYCFI #bibliography #code review #dataset #repository #what
- Who does what during a code review? datasets of OSS peer review repositories (KH, RGK, NY, AECC, KF, HI), pp. 49–52.
- MSR-2013-JanjicHSA #dataset #research #reuse #source code
- An unabridged source code dataset for research in software reuse (WJ, OH, MS, CA), pp. 339–342.
- MSR-2013-LamkanfiPD #dataset #debugging #eclipse #fault #mining
- The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information (AL, JP, SD), pp. 203–206.
- MSR-2013-MacLeanK #commit #dataset #network #social
- Apache commits: social network dataset (ACM, CDK), pp. 135–138.
- MSR-2013-RaemaekersDV #dataset #dependence #metric #repository
- The maven repository dataset of metrics, changes, and dependencies (SR, AvD, JV), pp. 221–224.
- MSR-2013-Squire #dataset
- Project roles in the apache software foundation: a dataset (MS), pp. 301–304.
- MSR-2013-Squire13a #dataset #twitter
- Apache-affiliated twitter screen names: a dataset (MS), pp. 305–308.
- MSR-2013-VasilescuSM #dataset #re-engineering
- A historical dataset of software engineering conferences (BV, AS, TM), pp. 373–376.
- MSR-2013-WagstromJS #dataset #graph #network #ruby
- A network of rails: a graph dataset of ruby on rails and associated projects (PW, CJ, AS), pp. 229–232.
- CSCW-2013-RostBCB #challenge #communication #dataset #representation #scalability #social #social media
- Representation and communication: challenges in interpreting large social media datasets (MR, LB, HC, BB), pp. 357–362.
- CIKM-2013-GilpinQD #clustering #dataset #performance #scalability
- Efficient hierarchical clustering of large high dimensional datasets (SG, BQ, ID), pp. 1371–1380.
- MLDM-2013-AllahSG #algorithm #array #dataset #mining #performance #scalability
- An Efficient and Scalable Algorithm for Mining Maximal — High Confidence Rules from Microarray Dataset (WZAA, YKES, FFMG), pp. 352–366.
- MLDM-2013-ParraL #clustering #dataset #using
- Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering (FLP, EL), pp. 130–143.
- RecSys-2013-Aiolli #dataset #performance #recommendation #scalability
- Efficient top-n recommendation for very large scale binary rated datasets (FA), pp. 273–280.
- SAC-2013-HapfelmeierSK #dataset #incremental #linear #performance
- Incremental linear model trees on massive datasets: keep it simple, keep it fast (AH, JS, SK), pp. 129–135.
- HPDC-2013-SuAWMWA #dataset #distributed #using
- Taming massive distributed datasets: data sampling using bitmap indices (YS, GA, JW, KM, JW, JPA), pp. 13–24.
- HPDC-2013-YinLBGN #dataset #order #performance #pipes and filters #using
- Efficient analytics on ordered datasets using MapReduce (JY, YL, MB, LG, AN), pp. 125–126.
- DRR-2012-WalkerLR #dataset #documentation #image
- A synthetic document image dataset for developing and evaluating historical document processing methods (DDW, WBL, EKR).
- VLDB-2012-Shirani-MehrKS #dataset #evaluation #performance #query #reachability #scalability
- Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets (HSM, FBK, CS), pp. 848–859.
- CHI-2012-FisherPDs #dataset #incremental #performance #scalability #trust #visualisation
- Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster (DF, IOP, SMD, MMCS), pp. 1673–1682.
- ICPR-2012-BoomHHF #clustering #dataset #image #using
- Supporting ground-truth annotation of image datasets using clustering (BJB, PXH, JH, RBF), pp. 1542–1545.
- ICPR-2012-ChenWY #clustering #dataset #graph
- Centroid-based clustering for graph datasets (LC, SW, XY), pp. 2144–2147.
- ICPR-2012-FausserS #clustering #dataset #kernel #scalability
- Clustering large datasets with kernel methods (SF, FS), pp. 501–504.
- ICPR-2012-MogelmoseTM #comparative #dataset #detection #evaluation #learning
- Learning to detect traffic signs: Comparative evaluation of synthetic and real-world datasets (AM, MMT, TBM), pp. 3452–3455.
- ICPR-2012-NafchiK #dataset #image #representation
- Rectangular based binary image representation: Theory, applications, and dataset introduction (HZN, HRK), pp. 190–193.
- ICPR-2012-TakalaP #dataset #identification #named #network #people
- CMV100: A dataset for people tracking and re-identification in sparse camera networks (VT, MP), pp. 1387–1390.
- ICPR-2012-TanLZ #dataset
- The dataset system of Economic Dispute handwritten (DSEDH) based on stroke shape and structure features (JT, JHL, XXZ), pp. 661–664.
- ICPR-2012-Utasi #classification #dataset
- Weighted conditional mutual information based boosting for classification of imbalanced datasets (ÁU), pp. 2711–2714.
- KDD-2012-ShiA #dataset #mobile #recommendation
- GetJar mobile application recommendations with very sparse datasets (KS, KA), pp. 204–212.
- KDIR-2012-KharbatBO #algorithm #case study #dataset
- A New Compaction Algorithm for LCS Rules — Breast Cancer Dataset Case Study (FK, LB, MO), pp. 382–385.
- KDIR-2012-Vanetik #classification #dataset
- Classification of Datasets with Frequent Itemsets is Wild (NV), pp. 386–389.
- MLDM-2012-JoutsijokiJ #case study #dataset
- DAGSVM vs. DAGKNN: An Experimental Case Study with Benthic Macroinvertebrate Dataset (HJ, MJ), pp. 439–453.
- SIGIR-2012-HuO #classification #dataset #using
- Genre classification for million song dataset using confidence-based classifiers combination (YH, MO), pp. 1083–1084.
- SAC-2012-GomiI #3d #dataset #image #mobile #multi #named
- MINI: a 3D mobile image browser with multi-dimensional datasets (AG, TI), pp. 989–996.
- HPDC-2012-HefeedaGA #approximate #clustering #dataset #distributed #scalability
- Distributed approximate spectral clustering for large-scale datasets (MH, FG, WAA), pp. 223–234.
- ICDAR-2011-AlaeiNP #benchmark #dataset #documentation #metric #segmentation
- A Benchmark Kannada Handwritten Document Dataset and Its Segmentation (AA, PN, UP), pp. 141–145.
- ICDAR-2011-QuiniouMSVMPM #dataset #named
- HAMEX — A Handwritten and Audio Dataset of Mathematical Expressions (SQ, HM, SPS, CVG, EM, SP, SM), pp. 452–456.
- SIGMOD-2011-DuanKSU #benchmark #comparison #dataset #metric #rdf
- Apples and oranges: a comparison of RDF benchmarks and real RDF datasets (SD, AK, KS, OU), pp. 145–156.
- SIGMOD-2011-KashyapP #agile #dataset #development #interface #query #xml
- Rapid development of web-based query interfacesfor XML datasets with QURSED (AK, MP), pp. 1339–1342.
- VLDB-2012-BarskyKWH11 #correlation #dataset #mining #scalability #taxonomy
- Mining Flipping Correlations from Large Datasets with Taxonomies (MB, SK, TW, JH), pp. 370–381.
- OCSC-2011-AliprandiRMTM #dataset #rdf #semantics #web #wiki
- Extracting Events from Wikipedia as RDF Triples Linked to Widespread Semantic Web Datasets (CA, FR, AM, MT, SM), pp. 90–99.
- CIKM-2011-CachedaCFF #algorithm #analysis #dataset #nearest neighbour
- Improving k-nearest neighbors algorithms: practical application of dataset analysis (FC, VC, DF, VF), pp. 2253–2256.
- CIKM-2011-LiuYS #dataset #query
- Subject-oriented top-k hot region queries in spatial dataset (JL, GY, HS), pp. 2409–2412.
- CIKM-2011-SelvarajBSS #classification #dataset
- Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset (SKS, BB, SS, SKS), pp. 653–662.
- KDD-2011-CordeiroTTLKF #clustering #dataset #multi #pipes and filters #scalability
- Clustering very large multi-dimensional datasets with MapReduce (RLFC, CTJ, AJMT, JL, UK, CF), pp. 690–698.
- SIGIR-2011-LeeHWHS #dataset #graph #image #learning #multi #pipes and filters #scalability #using
- Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce (WYL, LCH, GLW, WHH, YFS), pp. 1121–1122.
- SIGMOD-2010-WangWLWWLTXL #dataset #detection #named
- MapDupReducer: detecting near duplicates over massive datasets (CW, JW, XL, WW, HW, HL, WT, JX, RL), pp. 1119–1122.
- VLDB-2010-LiD #dataset #probability #ranking
- Ranking Continuous Probabilistic Datasets (JL, AD), pp. 638–649.
- VLDB-2010-Matsudaira #3d #biology #dataset #scalability
- High-End Biological Imaging Generates Very Large 3D+ and Dynamic Datasets (PM), p. 3.
- VLDB-2010-MelnikGLRSTV #analysis #dataset #interactive #named
- Dremel: Interactive Analysis of Web-Scale Datasets (SM, AG, JJL, GR, SS, MT, TV), pp. 330–339.
- VLDB-2010-ParameswaranGR #concept #dataset #scalability #towards #web
- Towards The Web of Concepts: Extracting Concepts from Large Datasets (AGP, HGM, AR), pp. 566–577.
- MSR-2010-BachmannB #correlation #dataset #debugging #process #quality #re-engineering
- When process data quality affects the number of bugs: Correlations in software engineering datasets (AB, AB), pp. 62–71.
- WCRE-2010-NguyenAH #bias #case study #dataset #debugging
- A Case Study of Bias in Bug-Fix Datasets (THDN, BA, AEH), pp. 259–268.
- PLDI-2010-ChenHEFPTW #dataset #optimisation
- Evaluating iterative optimization across 1000 datasets (YC, YH, LE, GF, LP, OT, CW), pp. 448–459.
- STOC-2010-BravermanO #dataset #independence
- Measuring independence of datasets (VB, RO), pp. 271–280.
- CIKM-2010-CormodeKW #algorithm #dataset #scalability #set
- Set cover algorithms for very large datasets (GC, HJK, AW), pp. 479–488.
- CIKM-2010-HarpaleYGHY #dataset #multi #named #performance #personalisation
- CiteData: a new multi-faceted dataset for evaluating personalized search performance (AH, YY, SG, DH, ZY), pp. 549–558.
- ICML-2010-SyedR #dataset #identification
- Unsupervised Risk Stratification in Clinical Datasets: Identifying Patients at Risk of Rare Outcomes (ZS, IR), pp. 1023–1030.
- ICML-2010-TanWT #dataset #feature model #learning
- Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets (MT, LW, IWT), pp. 1047–1054.
- ICPR-2010-AlvarezSVO #dataset
- Perceptual Color Texture Codebooks for Retrieving in Highly Diverse Texture Datasets (SÁ, AS, MV, XO), pp. 866–869.
- ICPR-2010-KimM #classification #dataset
- Dense Structure Inference for Object Classification in Aerial LIDAR Dataset (EK, GGM), pp. 3049–3052.
- ICPR-2010-PapaCF #classification #dataset #optimisation
- Optimizing Optimum-Path Forest Classification for Huge Datasets (JPP, FAMC, AXF), pp. 4162–4165.
- ICPR-2010-SodaI #composition #dataset #integration #learning
- Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration (PS, GI), pp. 3117–3120.
- ICPR-2010-StottingerZKH #dataset #evaluation
- FeEval A Dataset for Evaluation of Spatio-temporal Local Features (JS, SZ, RK, AH), pp. 499–502.
- RecSys-2010-PilaszyZT #dataset #feedback #matrix #performance
- Fast als-based matrix factorization for explicit and implicit feedback datasets (IP, DZ, DT), pp. 71–78.
- SAC-2010-BaralisCC #dataset #mining #persistent #scalability
- A persistent HY-Tree to efficiently support itemset mining on large datasets (EB, TC, SC), pp. 1060–1064.
- SAC-2010-LuccheseOP #dataset #generative #mining
- A generative pattern model for mining binary datasets (CL, SO, RP), pp. 1109–1110.
- ICDAR-2009-AntonacopoulosBPP #analysis #dataset #documentation #evaluation #layout #performance
- A Realistic Dataset for Performance Evaluation of Document Layout Analysis (AA, DB, CP, SP), pp. 296–300.
- CHI-2009-LeeSRCT #dataset #named #roadmap
- FacetLens: exposing trends and relationships to support sensemaking within faceted datasets (BL, GS, GGR, MC, DST), pp. 1293–1302.
- CIKM-2009-BalachandranPK #clustering #configuration management #dataset #documentation
- Interpretable and reconfigurable clustering of document datasets by deriving word-based rules (VB, DP, DK), pp. 1773–1776.
- CIKM-2009-Muntes-MuleroN #dataset #privacy #scalability
- Privacy and anonymization for very large datasets (VMM, JN), pp. 2117–2118.
- CIKM-2009-StoyanovichA #clustering #dataset
- Rank-aware clustering of structured datasets (JS, SAY), pp. 1429–1432.
- KDD-2009-DundarHBRR #case study #dataset #detection #learning #using
- Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology (MD, EDH, AKB, JPR, BR), pp. 279–288.
- MLDM-2009-CelepcikayEO #dataset #using
- Regional Pattern Discovery in Geo-referenced Datasets Using PCA (OUC, CFE, CO), pp. 719–733.
- MLDM-2009-SegataB #dataset #performance #scalability
- Fast Local Support Vector Machines for Large Datasets (NS, EB), pp. 295–310.
- SAC-2009-OwensMR #dataset #mining
- Capturing truthiness: mining truth tables in binary datasets (CCOI, TMM, NR), pp. 1467–1474.
- ESEC-FSE-2009-BirdBADBFD #bias #dataset #debugging
- Fair and balanced?: bias in bug-fix datasets (CB, AB, EA, JD, AB, VF, PTD), pp. 121–130.
- SIGMOD-2008-PanZW #clustering #composition #dataset #matrix #named #performance #scalability
- CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition (FP, XZ, WW), pp. 173–184.
- ICEIS-DISI-2008-MuldnerML #data access #dataset #policy #xml
- Succinct Access Control Policies for Published XML Datasets (TM, JKM, GL), pp. 380–385.
- CIKM-2008-ChoudharyMB #dataset #evolution #on the
- On quantifying changes in temporally evolving dataset (RC, SM, AB), pp. 1459–1460.
- CIKM-2008-LiuLNBMG #dataset #feature model #performance #preprocessor #realtime #scalability
- Real-time data pre-processing technique for efficient feature extraction in large scale datasets (YL, LVL, RSN, KB, PM, CLG), pp. 981–990.
- CIKM-2008-NguyenS #analysis #correlation #dataset #performance
- Fast correlation analysis on time series datasets (PN, NS), pp. 787–796.
- CIKM-2008-ShawXG #approximate #dataset
- Deriving non-redundant approximate association rules from hierarchical datasets (GS, YX, SG), pp. 1451–1452.
- ICML-2008-WolfeHK #dataset #distributed #scalability
- Fully distributed EM for very large datasets (JW, AH, DK), pp. 1184–1191.
- ICPR-2008-WatanabeK08a #dataset #scalability
- RANSAC-SVM for large-scale datasets (KW, TK), pp. 1–4.
- KDD-2008-DasSN #category theory #dataset #detection
- Anomaly pattern detection in categorical datasets (KD, JGS, DBN), pp. 169–176.
- FSE-2008-OsterweilCEPWBH #dataset #experience #process #using #workflow
- Experience in using a process language to define scientific workflow and generate dataset provenance (LJO, LAC, AME, RMP, AEW, ERB, JLH), pp. 319–329.
- SIGMOD-2007-XiaoT #dataset #named #privacy #towards
- M-invariance: towards privacy preserving re-publication of dynamic datasets (XX, YT), pp. 689–700.
- HIMI-IIE-2007-CollinsNMRBCPFHP #collaboration #dataset
- The Karst Collaborative Workspace for Analyzing and Annotating Scientific Datasets (LMC, DEN, MLBM, JVR, MAB, CRC, JEP, BFS, SKH, JCP), pp. 3–12.
- KDD-2007-DasS #category theory #dataset #detection
- Detecting anomalous records in categorical datasets (KD, JGS), pp. 220–229.
- PADL-2007-Costa #dataset #performance #prolog
- Prolog Performance on Larger Datasets (VSC), pp. 185–199.
- DRR-2006-ZhangA #dataset #towards
- Toward quantifying the amount of style in a dataset (XZ, SA).
- SIGMOD-2006-KiferG #dataset #injection
- Injecting utility into anonymized datasets (DK, JG), pp. 217–228.
- VLDB-2006-GemullaLH #dataset #evolution #maintenance
- A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets (RG, WL, PJH), pp. 595–606.
- VLDB-2006-JiTT #3d #dataset #mining
- Mining Frequent Closed Cubes in 3D Datasets (LJ, KLT, AKHT), pp. 811–822.
- ICPR-v1-2006-AnC #dataset
- Finding Rule Groups to Classify High Dimensional Gene Expression Datasets (JA, YPPC), pp. 1196–1199.
- ICPR-v3-2006-FelsbergG #dataset #multi #named #robust #scalability
- P-Channels: Robust Multivariate M-Estimation of Large Datasets (MF, GHG), pp. 262–267.
- ICPR-v4-2006-Jun #comparison #dataset #detection
- A Peer Dataset Comparison Outlier Detection Model Applied to Financial Surveillance (TJ), pp. 900–903.
- SAC-2006-FangG #bound #dataset
- Boundary surface extraction and rendering for volume datasets (SF, PG), pp. 1356–1360.
- SAC-2006-NemalhabibS #algorithm #category theory #clustering #dataset #named
- CLUC: a natural clustering algorithm for categorical datasets based on cohesion (AN, NS), pp. 637–638.
- ICDAR-2005-AgrawalBMV #dataset #named #online #representation #xml
- UPX: A New XML Representation for Annotated Datasets of Online Handwriting Data (MA, KB, SM, LV), pp. 1161–1165.
- ICEIS-v2-2005-DoP #dataset #mining #scalability #visualisation
- Mining Very Large Datasets with SVM and Visualization (TND, FP), pp. 127–141.
- KDD-2005-JinWPPA #dataset #graph
- Discovering frequent topological structures from graph datasets (RJ, CW, DP, SP, GA), pp. 606–611.
- KDD-2005-ZakiPAS #algorithm #category theory #clustering #dataset #effectiveness #mining #named
- CLICKS: an effective algorithm for mining subspace clusters in categorical datasets (MJZ, MP, IA, TS), pp. 736–742.
- MLDM-2005-FerrandizB05a #dataset #evaluation
- Supervised Evaluation of Dataset Partitions: Advantages and Practice (SF, MB), pp. 600–609.
- MLDM-2005-SiaL #clustering #dataset #scalability #using
- Clustering Large Dynamic Datasets Using Exemplar Points (WS, MML), pp. 163–173.
- SIGMOD-2004-CongXPTY #array #dataset #named
- FARMER: Finding Interesting Rule Groups in Microarray Datasets (GC, AKHT, XX, FP, JY), pp. 143–154.
- VLDB-2004-HoweM #algebra #dataset
- Algebraic Manipulation of Scientific Datasets (BH, DM), pp. 924–935.
- CIKM-2004-ChenL #clustering #dataset #named #scalability #visualisation
- ClusterMap: labeling clusters in large datasets via visualization (KC, LL), pp. 285–293.
- CIKM-2004-ChungJM #clustering #dataset #mining #using
- Mining gene expression datasets using density-based clustering (SC, JJ, DM), pp. 150–151.
- ICPR-v4-2004-WillisC #3d #dataset #multi #symmetry
- Alignment of Multiple Non-Overlapping Axially Symmetric 3D Datasets (ARW, DBC), pp. 96–99.
- KDD-2004-TruongLB #dataset #learning #random #using
- Learning a complex metabolomic dataset using random forests and support vector machines (YT, XL, CB), pp. 835–840.
- SIGIR-2004-DavidovGM #categorisation #dataset #generative
- Parameterized generation of labeled datasets for text categorization based on a hierarchical directory (DD, EG, SM), pp. 250–257.
- SAC-2004-AdamJA #dataset #detection
- Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets (NRA, VPJ, VA), pp. 576–583.
- SAC-2004-CarswellGN #dataset #multi #semantics #transaction
- Wireless spatio-semantic transactions on multimedia datasets (JDC, KG, MN), pp. 1201–1205.
- DRR-2003-HauserSSDST #dataset
- Correcting OCR text by association with historical datasets (SEH, JS, TFS, DDF, SS, GRT), pp. 84–93.
- VLDB-2003-LinLYZ #dataset #multi #scalability
- Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets (XL, QL, YY, XZ), pp. 814–825.
- ICEIS-v2-2003-DoP #algorithm #dataset #mining #scalability
- Mining Very Large Datasets with Support Vector Machine Algorithms (TND, FP), pp. 140–147.
- ICML-2003-LeskovecS #dataset #linear #programming
- Linear Programming Boosting for Uneven Datasets (JL, JST), pp. 456–463.
- ICML-2003-ZhuWC #dataset #scalability
- Eliminating Class Noise in Large Datasets (XZ, XW, QC), pp. 920–927.
- KDD-2003-El-HajjZ #dataset #interactive #matrix #mining #performance #scalability
- Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining (MEH, ORZ), pp. 109–118.
- KDD-2003-KoyuturkG #dataset #framework #named
- PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets (MK, AG), pp. 147–156.
- KDD-2003-PanCTYZ #biology #dataset #named
- Carpenter: finding closed patterns in long biological datasets (FP, GC, AKHT, JY, MJZ), pp. 637–642.
- KDD-2003-PeterCG #algorithm #clustering #dataset #scalability
- New unsupervised clustering algorithm for large datasets (WP, JC, CG), pp. 643–648.
- MLDM-2003-LeleuRBE #dataset #mining #named
- GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions (ML, CR, JFB, GE), pp. 293–306.
- SAC-2003-ChenL #clustering #dataset #visualisation
- Cluster Rendering of Skewed Datasets via Visualization (KC, LL), pp. 909–916.
- CIKM-2002-ZhaoK #algorithm #clustering #dataset #documentation #evaluation
- Evaluation of hierarchical clustering algorithms for document datasets (YZ, GK), pp. 515–524.
- KDD-2002-RidgewayM #analysis #dataset
- Bayesian analysis of massive datasets via particle filters (GR, DM), pp. 5–13.
- KDD-2002-TantrumMS #clustering #dataset #modelling #scalability
- Hierarchical model-based clustering of large datasets through fractionation and refractionation (JT, AM, WS), pp. 183–190.
- ICEIS-v1-2001-KotsisWFM #dataset #multi #novel #visualisation
- Novel Data Visualisation and Exploration in Multidimensional Datasets (NK, GRSW, JDF, DRM), pp. 170–175.
- ICML-2001-DomeniconiG #approach #approximate #classification #dataset #multi #nearest neighbour #performance #query #scalability
- An Efficient Approach for Approximating Multi-dimensional Range Queries and Nearest Neighbor Classification in Large Datasets (CD, DG), pp. 98–105.
- KDD-2001-BeygelzimerPM #category theory #dataset #performance #scalability #visualisation
- Fast ordering of large categorical datasets for better visualization (AB, CSP, SM), pp. 239–244.
- KDD-2000-BarbaraC #clustering #dataset #using
- Using the fractal dimension to cluster datasets (DB, PC), pp. 260–264.
- KDD-2000-Yang #3d #dataset #interactive #relational #scalability
- Interactive exploration of very large relational datasets through 3D dynamic projections (LY), pp. 236–243.
- KDD-2000-ZhangDR #constraints #dataset #scalability
- Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets (XZ, GD, KR), pp. 310–314.
- SIGMOD-1999-MankuRL #dataset #online #order #performance #random #scalability #statistics
- Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets (GSM, SR, BGL), pp. 251–262.
- KDD-1999-DaviesM #dataset #network
- Bayesian Networks for Lossless Dataset Compression (SD, AWM), pp. 387–391.
- VLDB-1998-GehrkeRG #dataset #framework #named #performance #scalability
- RainForest — A Framework for Fast Decision Tree Construction of Large Datasets (JG, RR, VG), pp. 416–427.
- VLDB-1998-KnorrN #algorithm #dataset #mining #scalability
- Algorithms for Mining Distance-Based Outliers in Large Datasets (EMK, RTN), pp. 392–403.
- VLDB-1998-ShuklaDN #dataset #multi
- Materialized View Selection for Multidimensional Datasets (AS, PD, JFN), pp. 488–499.
- KDD-1998-AlsabtiRS #classification #dataset #named #scalability
- CLOUDS: A Decision Tree Classifier for Large Datasets (KA, SR, VS), pp. 2–8.
- KDD-1998-OatesJ #dataset #modelling #scalability
- Large Datasets Lead to Overly Complex Models: An Explanation and a Solution (TO, DJ), pp. 294–298.
- SIGMOD-1997-KornJF #ad hoc #dataset #query #scalability #sequence
- Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences (FK, HVJ, CF), pp. 289–300.
- SIGMOD-1997-LivnyRBCDLMW #dataset #named #query #scalability #visualisation
- DEVise: Integrated Querying and Visualization of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 301–312.
- SIGMOD-1997-LivnyRBCDLMW97a #dataset #named #query #scalability #visual notation
- DEVise: Integrated Querying and Visual Exploration of Large Datasets (Demo Abstract) (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 517–520.
- KDD-1997-ZupanBBC #approach #composition #data mining #dataset #mining
- A Dataset Decomposition Approach to Data Mining and Machine Discovery (BZ, MB, IB, BC), pp. 299–302.
- SIGMOD-1995-FaloutsosL #algorithm #dataset #multi #named #performance #visualisation
- FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets (CF, KIL), pp. 163–174.
- KDD-1995-StolorzNMMSSYNCMF #data mining #dataset #mining #performance #scalability
- Fast Spatio-Temporal Data Mining of Large Geophysical Datasets (PES, HN, EM, RRM, ECS, JRS, JY, KWN, SYC, CRM, JDF), pp. 300–305.