Tag #dataset
294 papers:
EDM-2019-JensenHD #detection #modelling #student- Generalizability of Sensor-Free Affect Detection Models in a Longitudinal Dataset of Tens of Thousands of Students (EJ, SH, SKD).
ICSME-2019-LevinY #scalability #source code- Processing Large Datasets of Fined Grained Source Code Changes (SL, AY), pp. 382–385.
ICSME-2019-NewmanDAPKH19a #open data- An Open Dataset of Abbreviations and Expansions (CDN, MJD, RSA, AP, DK, EH0), p. 280.
MSR-2019-AhluwaliaFP #fault #named #predict- Snoring: a noise in defect prediction datasets (AA, DF, MDP), pp. 63–67.
MSR-2019-BiswasIHR #python- Boa meets python: a boa dataset of data science software in python language (SB, MJI, YH, HR), pp. 577–581.
MSR-2019-JoshiC #agile #git #named- RapidRelease: a dataset of projects and issues on github with rapid releases (SDJ, SC), pp. 587–591.
MSR-2019-PietriSZ #development #graph- The software heritage graph dataset: public software development under one roof (AP, DS, SZ), pp. 138–142.
MSR-2019-PontaPSBD #open source- A manually-curated dataset of fixes to vulnerabilities of open-source software (SEP, HP, AS, MB, CD), pp. 383–387.
MSR-2019-RaduN #debugging #non-functional- A dataset of non-functional bugs (AR, SN), pp. 399–403.
MSR-2019-WangSL0 #android #metadata #named #reliability #towards- RmvDroid: towards a reliable Android malware dataset with app metadata (HW, JS, HL, YG0), pp. 404–408.
MSR-2019-WickertREDM #encryption #parametricity- A dataset of parametric cryptographic misuses (AKW, MR, ME, AD, MM), pp. 96–100.
CIKM-2019-ChenMLZM #named #scalability #web- TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions (JC, JM, YL, MZ0, SM), pp. 2485–2488.
CIKM-2019-ChenWCKQ #generative #query #towards- Towards More Usable Dataset Search: From Query Characterization to Snippet Generation (JC, XW, GC0, EK, YQ), pp. 2445–2448.
CIKM-2019-Cohen-ShapiraRS #named #recommendation #representation #visual notation- AutoGRD: Model Recommendation Through Graphical Dataset Representation (NCS, LR, BS, GK, RV), pp. 821–830.
CIKM-2019-SunAJHS #flexibility #named- MithraLabel: Flexible Dataset Nutritional Labels for Responsible Data Science (CS, AA, HVJ, BH, JS), pp. 2893–2896.
ECIR-p1-2019-LinjordetB #modelling- Impact of Training Dataset Size on Neural Answer Selection Models (TL, KB), pp. 828–835.
ECIR-p2-2019-Dosso #keyword #rdf- Keyword Search on RDF Datasets (DD), pp. 332–336.
ICML-2019-CornishVBDD #scalability- Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets (RC, PV, ABC, GD, AD), pp. 1351–1360.
ICML-2019-GhadikolaeiGFS #big data #learning- Learning and Data Selection in Big Datasets (HSG, HGG, CF, MS), pp. 2191–2200.
ESEC-FSE-2019-MiryeganehAH #approach #automation #integration #towards- An IR-based approach towards automated integration of geo-spatial datasets in map-based software systems (NM, MA, HH), pp. 946–954.
- ICSE-2019-DmeiriTWBLDVR #mining #named
- BugSwarm: mining and continuously growing a dataset of reproducible failures and fixes (DAT, ND, YW, AB, YCL, PTD, BV, CRG), pp. 339–349.
ICTSS-2019-NakajimaC #generative #machine learning #source code #testing- Generating Biased Dataset for Metamorphic Testing of Machine Learning Programs (SN0, TYC), pp. 56–64.
ICSME-2018-0008ZOPLB #analysis #re-engineering #sentiment- Two Datasets for Sentiment Analysis in Software Engineering (BL0, FZ, RO, MDP, ML, GB), p. 712.
MSR-2018-GaoYJLYZ #concurrent #named #testing- Jbench: a dataset of data races for concurrency testing (JG, XY, YJ0, HL0, WY, XZ), pp. 6–9.
MSR-2018-GeigerMPPNB #android #commit #graph- A graph-based dataset of commit history of real-world Android apps (FXG, IM, LP, FP, DDN, AB), pp. 30–33.
MSR-2018-GkortzisMS #named #open source #security- VulinOSS: a dataset of security vulnerabilities in open-source systems (AG, DM, DS), pp. 18–21.
MSR-2018-MarkovtsevL #git- Public git archive: a big code dataset for all (VM, WL), pp. 34–37.
MSR-2018-MartinsAL #java #named- 50K-C: a dataset of compilable, and compiled, Java projects (PM0, RA, CVL), pp. 1–5.
MSR-2018-ProkschAN #developer #empirical #process- Enriched event streams: a general dataset for empirical studies on in-IDE activities of software developers (SP, SA, SN), pp. 62–65.
MSR-2018-SahaLLYP #debugging #java #scalability- Bugs.jar: a large-scale, diverse dataset of real-world Java bugs (RKS, YL, WL, HY, MRP), pp. 10–13.
MSR-2018-XuZ #kernel #linux #multi- A multi-level dataset of linux kernel patchwork (YX, MZ), pp. 54–57.
MSR-2018-YuLYWW #git- A dataset of duplicate pull-requests in github (YY0, ZL, GY, TW0, HW), pp. 22–25.
SANER-2018-SobreiraDDMM #debugging #fault- Dissection of a bug dataset: Anatomy of 395 patches from Defects4J (VS, TD, FM, MM, MdAM), pp. 130–140.
CIG-2018-AungBDCKYW #learning #predict #scalability- Predicting Skill Learning in a Large, Longitudinal MOBA Dataset (MA, VB, AD, PIC, AVK, CY, ARW), pp. 1–7.
CIG-2018-VariaTKK #3d #analysis #game studies- A Refined 3D Dataset for the Analysis of Player Actions in Exertion Games (CV, GT, KK, SDK), pp. 1–4.
CIKM-2018-AfsharPPSHS #named #scalability- COPA: Constrained PARAFAC2 for Sparse & Large Datasets (AA, IP, EEP, ES, JCH, JS), pp. 793–802.
CIKM-2018-HoangVN #benchmark #detection #metric #named #topic- W2E: A Worldwide-Event Benchmark Dataset for Topic Detection and Tracking (TAH, KDV, WN), pp. 1847–1850.
CIKM-2018-LevchenkoYAMKS #distributed #named #sketching- Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets (OL, DEY, RA, FM, BK, DES), pp. 1951–1954.
ECIR-2018-DurRF #architecture #benchmark #challenge #lessons learnt #metric- Reproducing a Neural Question Answering Architecture Applied to the SQuAD Benchmark Dataset: Challenges and Lessons Learned (AD, AR, PF), pp. 102–113.
ICML-2018-YoonJS18a #generative #modelling #multi #named #network #predict #using- RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks (JY, JJ, MvdS), pp. 5685–5693.
ICPR-2018-ChowdhuryAT0R #recognition- MSU-AVIS dataset: Fusing Face and Voice Modalities for Biometric Recognition in Indoor Surveillance Videos (AC, YA, LT, XL0, AR), pp. 3567–3573.
ICPR-2018-FerrariBB #identification- Extended YouTube Faces: a Dataset for Heterogeneous Open-Set Face Identification (CF, SB, ADB), pp. 3408–3413.
ICPR-2018-LiangLJXL #benchmark #metric #multi #named #predict- SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction (LL, LL, LJ, DX, ML), pp. 1598–1603.
ICPR-2018-MerchantSKM #image #using- Appearance-based data augmentation for image datasets using contrast preserving sampling (AKM, TQS, BK, RM), pp. 1235–1240.
ICPR-2018-TranLPHKTNP #analysis #multi- A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality (THT, TLL, DTP, VNH, VMK, QTT, TSN, CP), pp. 1947–1952.
ICPR-2018-TuggenerESPS #classification #detection #segmentation- DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects (LT, IE, JS, MP, TS), pp. 3704–3709.
KDD-2018-ChenCL #open data- Rotation-blended CNNs on a New Open Dataset for Tropical Cyclone Image-to-intensity Regression (BC, BFC, HTL), pp. 90–99.
KDD-2018-RijnH - Hyperparameter Importance Across Datasets (JNvR, FH), pp. 2367–2376.
JCDL-2017-DuretecR0 #benchmark #metric- A Text Extraction Software Benchmark Based on a Synthesized Dataset (KD, AR, CB0), pp. 109–118.
JCDL-2017-SinghNGBMG #behaviour #case study #reuse- Citation Sentence Reuse Behavior of Scientists: A Case Study on Massive Bibliographic Text Dataset of Computer Science (MS0, AN, DG, NAB, AM0, PG), pp. 277–280.
MSR-2017-AivaloglouHMR #source code- A dataset of scratch programs: scraped, shaped and scored (EA, FH, JML, GR), pp. 511–514.
MSR-2017-MadeyskiK #fault #idea #predict- Continuous defect prediction: the idea and a related dataset (LM, MK), pp. 515–518.
MSR-2017-OrellanaLMD #difference #integration #on the #testing- On the differences between unit and integration testing in the travistorrent dataset (GO, GL, AM, SD), pp. 451–454.
MSR-2017-RoblesHHCF #git #modelling #uml- An extensive dataset of UML models in GitHub (GR, THQ, RH, MRVC, MAF), pp. 519–522.
MSR-2017-SadatBM - Rediscovery datasets: connecting duplicate reports (MS, ABB, AVM), pp. 527–530.
MSR-2017-ZhuLRC #semantics #version control- A dataset for dynamic discovery of semantic changes in version controlled software histories (CZ, YL0, JR, MC), pp. 523–526.
AIIDE-2017-LinGKS #named #research- STARDATA: A StarCraft AI Research Dataset (ZL, JG, VK, GS), pp. 50–56.
ECIR-2017-BhattacharjeeA #algorithm #clustering #incremental #nearest neighbour- Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets (PB, AA), pp. 568–574.
ICML-2017-ValeraG #automation #statistics- Automatic Discovery of the Statistical Types of Variables in a Dataset (IV, ZG), pp. 3521–3529.
ICML-2017-ZhouZIJWS #multi #testing- When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, l₂-consistency and Neuroscience Applications (HHZ, YZ, VKI, SCJ, GW, VS), pp. 4170–4179.
MSR-2016-CosentinoIC #git- Findings from GitHub: methods, datasets and limitations (VC, JLCI, JC), pp. 137–141.
MSR-2016-ProkschANM #c# #syntax- A dataset of simplified syntax trees for C# (SP, SA, SN, MM), pp. 476–479.
MSR-2016-YangKYI #code review #mining #overview #people #process #repository- Mining the modern code review repositories: a dataset of people, process and product (XY, RGK, NY, HI), pp. 460–463.
MSR-2016-ZhuZM #issue tracking #multi- Multi-extract and multi-level dataset of mozilla issue tracking history (JZ, MZ, HM), pp. 472–475.
SANER-2016-KadarHFG #assessment #maintenance #refactoring- A Code Refactoring Dataset and Its Assessment Regarding Software Maintainability (IK, PH, RF, TG), pp. 599–603.
CIKM-2016-BleifussBFRW0PN #approximate #dependence #functional #scalability- Approximate Discovery of Functional Dependencies for Large Datasets (TB, SB, JF, JR, GW, SK0, TP, FN), pp. 1803–1812.
CIKM-2016-CaoY #benchmark #metric #modelling #named #network #platform #social- ASNets: A Benchmark Dataset of Aligned Social Networks for Cross-Platform User Modeling (XC, YY0), pp. 1881–1884.
CIKM-2016-NguyenTTN #named #social #summary- SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization (MTN, CXT, DVT, MLN), pp. 2409–2412.
ECIR-2016-BotevaGSR #information retrieval #learning #rank- A Full-Text Learning to Rank Dataset for Medical Information Retrieval (VB, DGG, AS, SR), pp. 716–722.
ICPR-2016-MoazzenT #approximate #clustering- Sampling based approximate spectral clustering ensemble for partitioning datasets (YM, KT), pp. 1630–1635.
ICPR-2016-MurguiaRA #adaptation #architecture #evaluation #modelling #network #parallel- Evaluation of the background modeling method Auto-Adaptive Parallel Neural Network Architecture in the SBMnet dataset (MICM, JARQ, GRA), pp. 137–142.
ICPR-2016-OrtegoSM #estimation #multi #re-engineering- Rejection based multipath reconstruction for background estimation in SBMnet 2016 dataset (DO, JCS, JMM0), pp. 114–119.
ICPR-2016-YuenMT #algorithm #evaluation #on the- On looking at faces in an automobile: Issues, algorithms and evaluation on naturalistic driving dataset (KY, SM, MMT), pp. 2777–2782.
KDD-2016-GaoP #named #relational- Squish: Near-Optimal Compression for Archival of Relational Datasets (YG, AGP), pp. 1575–1584.
KDD-2016-MaiAS #algorithm #clustering #named #performance #scalability- AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets (STM, IA, MS), pp. 1025–1034.
DRR-2015-ChenSWLHI #analysis #documentation #layout- Ground truth model, tool, and dataset for layout analysis of historical documents (KC, MS, HW, ML, JH, RI), p. 940204.
HT-2015-RoutB #algorithm #ranking #twitter- A Human-annotated Dataset for Evaluating Tweet Ranking Algorithms (DPR, KB), pp. 95–99.
PODS-2015-Cormode #scalability #summary- Compact Summaries over Large Datasets (GC), pp. 157–158.
TPDL-2015-LlewellynGAOT #topic #twitter- Extracting a Topic Specific Dataset from a Twitter Archive (CL, CG, BA, JO, RT), pp. 364–367.
VLDB-2015-BhattacherjeeCH #trade-off #version control- Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff (SB, AC, SH, AD, AGP), pp. 1346–1357.
VLDB-2015-HarbiAKM #query #rdf- Evaluating SPARQL Queries on Massive RDF Datasets (RH, IA, PK, NM), pp. 1848–1859.
EDM-2015-VossSMS #approach #learning #matrix- A Transfer Learning Approach for Applying Matrix Factorization to Small ITS Datasets (LV, CS, CM, LST), pp. 372–375.
ICSME-2015-JansenH #industrial #smell #spreadsheet- Code smells in spreadsheet formulas revisited on an industrial dataset (BJ, FH), pp. 372–380.
MSR-2015-AltingerSDW #embedded #fault #industrial #modelling #novel #predict- A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software (HA, SS, YD, FW), pp. 494–497.
MSR-2015-GermanAH #git #linux #process- A Dataset of the Activity of the Git Super-repository of Linux in 2012 (DMG, BA, AEH), pp. 470–473.
MSR-2015-HabayebMMBB #fault- The Firefox Temporal Defect Dataset (MH, AVM, SSM, LB, AB), pp. 498–501.
MSR-2015-KrutzMMRPFS #android #open source- A Dataset of Open-Source Android Applications (DEK, MM, SAM, AR, JP, AF, JS), pp. 522–525.
MSR-2015-MauczkaBSG #commit #developer- Dataset of Developer-Labeled Commit Messages (AM, FB, CS, TG), pp. 490–493.
MSR-2015-OhiraKYYMLFHIM #classification #debugging- A Dataset of High Impact Bugs: Manually-Classified Issue Reports (MO, YK, YY, HY, YM, NL, KF, HH, AI, KiM), pp. 518–521.
MSR-2015-PalombaNTBOPL #evaluation #named #open data #smell- Landfill: An Open Dataset of Code Smells with Public Evaluation (FP, DDN, MT, GB, RO, DP, ADL), pp. 482–485.
MSR-2015-SawantB #api- A Dataset for API Usage (AAS, AB), pp. 506–509.
MSR-2015-WermelingerY #architecture #evolution- An Architectural Evolution Dataset (MW, YY), pp. 502–505.
MSR-2015-Zacchiroli #metadata #source code- The Debsources Dataset: Two Decades of Debian Source Code Metadata (SZ), pp. 466–469.
SCAM-2015-AivaloglouHH #scalability #spreadsheet- A grammar for spreadsheet formulas evaluated on two large datasets (EA, DH, FH), pp. 121–130.
VS-Games-2015-ChalasFFSK #3d #generative- Generation of Variable Human Faces from 3D Scan Dataset (IC, ZF, KF, JS, BK), pp. 1–8.
CSCW-2015-QuattroneCM #bias- There’s No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets (GQ, LC, PDM), pp. 1021–1032.
ICEIS-v2-2015-SarinhoLS #linked data #open data #question- Can You Find All the Data You Expect in a Linked Dataset? (WTS, BFL, DS), pp. 648–655.
CIKM-2015-LiXJL #adaptation #approach- Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach (HL, LX0, XJ, JL), pp. 1001–1010.
CIKM-2015-SinghPK0MG #case study #predict- The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset (MS0, VP, SK, TC0, AM0, PG), pp. 1271–1280.
ICML-2015-BarbosaENW #distributed #power of- The Power of Randomization: Distributed Submodular Maximization on Massive Datasets (RdPB, AE, HLN, JW), pp. 1236–1244.
ICML-2015-MaLF #analysis #canonical #correlation #linear #scalability- Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis (ZM, YL, DPF), pp. 169–178.
KDD-2015-CaoWYR #online #scalability- Online Outlier Exploration Over Large Datasets (LC, MW, DY, EAR), pp. 89–98.
RecSys-2015-Ben-ShimonTFSRH #challenge- RecSys Challenge 2015 and the YOOCHOOSE Dataset (DBS, AT, MF, BS, LR, JH), pp. 357–358.
SIGIR-2015-MorenoD #adaptation #metric #semistructured data- Adapted B-CUBED Metrics to Unbalanced Datasets (JGM, GD), pp. 911–914.
ASE-2015-NamK #fault #named #predict- CLAMI: Defect Prediction on Unlabeled Datasets (T) (JN, SK), pp. 452–463.
ICSE-v2-2015-HermansM #analysis #email #spreadsheet- Enron’s Spreadsheets and Related Emails: A Dataset and Analysis (FH, ERMH), pp. 7–16.
SAC-2015-RochaRCOMVADGF #algorithm #classification #documentation #named #performance #using- G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
DRR-2014-BrunoL #documentation #open data #recognition #research- The Lehigh Steel Collection: a new open dataset for document recognition research (BB, DPL), p. ?–9.
JCDL-2014-CastroSR #data transformation #lightweight #ontology #research #workflow- Creating lightweight ontologies for dataset description practical applications in a cross-domain research data management workflow (JAC, JRdS, CR), pp. 313–316.
JCDL-2014-HasanGFBM #framework #library- Data mapping framework in a digital library with computational epidemiology datasets (SMSH, SG, EAF, KRB, MVM), pp. 449–450.
JCDL-2014-LlewellynRBKSJ #information management- Building a dataset of sensitive information (CL, LR, RB, SK, MS, RvJ), pp. 493–494.
SIGMOD-2014-SatishSPSPHSYD #framework #graph #navigation #using- Navigating the maze of graph analytics frameworks using massive graph datasets (NS, NS, MMAP, JS, JP, MAH, SS, ZY, PD), pp. 979–990.
VLDB-2015-MozafariSFJM14 #learning #scalability- Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning (BM, PS, MJF, MIJ, SM), pp. 125–136.
EDM-2014-LiuMK #testing- Interpreting model discovery and testing generalization to a new dataset (RL0, EAM, KRK), pp. 107–113.
ICSME-2014-ThongtanunamYYKCFI #code review #named #overview #visualisation- ReDA: A Web-Based Visualization Tool for Analyzing Modern Code Review Dataset (PT, XY, NY, RGK, AECC, KF, HI), pp. 605–608.
MSR-2014-GousiosZ #development #research- A dataset for pull-based development research (GG, AZ), pp. 368–371.
MSR-2014-LazarRS14a #debugging #generative- Generating duplicate bug datasets (AL, SR, BS), pp. 392–395.
MSR-2014-MurakamiHK - A dataset of clone references with gaps (HM, YH, SK), pp. 412–415.
MSR-2014-PassosC #feature model #kernel #linux- A dataset of feature additions and feature removals from the Linux kernel (LTP, KC), pp. 376–379.
MSR-2014-RoblesRSVG #challenge #overview- FLOSS 2013: a survey dataset about free software contributors: challenges for curating, sharing, and combining (GR, LAR, AS, BV, JMGB), pp. 396–399.
MSR-2014-SainiSOL #debugging- A dataset for maven artifacts and bug patterns found in them (VS, HS, JO, CVL), pp. 416–419.
MSR-2014-WilliamsRMRK #modelling- Models of OSS project meta-information: a dataset of three forges (JRW, DDR, NDM, JDR, DSK), pp. 408–411.
MSR-2014-ZhangH #energy #mining- A green miner’s dataset: mining the impact of software change on energy consumption (CZ, AH), pp. 400–403.
HCI-AIMT-2014-RuffieuxLMK #gesture #overview #recognition- A Survey of Datasets for Human Gesture Recognition (SR, DL, EM, OAK), pp. 337–348.
HIMI-DE-2014-GombosK #query #recommendation- SPARQL Query Writing with Recommendations Based on Datasets (GG, AK), pp. 310–319.
ICEIS-v1-2014-TimoteoVF #analysis #case study #network #project management- Evaluating Artificial Neural Networks and Traditional Approaches for Risk Analysis in Software Project Management — A Case Study with PERIL Dataset (CT, MV, SF), pp. 472–479.
ECIR-2014-BelloginSVS #challenge #evaluation #web- Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track (AB, TS, APdV, AS), pp. 430–436.
ECIR-2014-CarageaWCWRCWG #big data- CiteSeer x : A Scholarly Big Dataset (CC, JW, AMC, KW, JPFR, HHC, ZW, CLG), pp. 311–322.
ICPR-2014-Calvo-ZaragozaO #music #recognition- Recognition of Pen-Based Music Notation: The HOMUS Dataset (JCZ, JO), pp. 3038–3043.
ICPR-2014-FletcherI #evaluation #quality- Quality Evaluation of an Anonymized Dataset (SF, MZI), pp. 3594–3599.
ICPR-2014-SandhanC #hybrid #pattern matching #pattern recognition #recognition- Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition (TS, JYC), pp. 1449–1453.
ICPR-2014-WangS #automation #multi #segmentation #using- Automatic Multi-organ Segmentation in Non-enhanced CT Datasets Using Hierarchical Shape Priors (CW, ÖS), pp. 3327–3332.
MLDM-2014-JavedA #classification #network #social #using- Creation of Bi-lingual Social Network Dataset Using Classifiers (IJ, HA), pp. 523–533.
MLDM-2014-WaiyamaiS #approach #classification- A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets (KW, PS), pp. 31–42.
HPDC-2014-SuAWBS #analysis #correlation #distributed #parallel- Supporting correlation analysis on scientific datasets in parallel and distributed settings (YS, GA, JW, AB, HWS), pp. 191–202.
ICDAR-2013-ShivramRSG #named- IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset (AS, CR, SS, VG), pp. 13–17.
JCDL-2013-GozaliKS #library- Constructing an anonymous dataset from the personal digital photo libraries of mac app store users (JPG, MYK, HS), pp. 305–308.
MSR-2013-BinkleyLPHV #identifier- A dataset for evaluating identifier splitters (DB, DL, LLP, EH, KVS), pp. 401–404.
MSR-2013-ButlerWYS #identifier #named- INVocD: identifier name vocabulary dataset (SB, MW, YY, HS), pp. 405–408.
MSR-2013-DitHPK #evaluation #maintenance- A dataset from change history to support evaluation of software maintenance tasks (BD, AH, DP, HHK), pp. 131–134.
MSR-2013-GoeminneCM #ecosystem #gnome- A historical dataset for the gnome ecosystem (MG, MC, TM), pp. 225–228.
MSR-2013-Gousios - The GHTorent dataset and tool suite (GG), pp. 233–236.
MSR-2013-HamasakiKYCFI #code review #overview #repository #what- Who does what during a code review? datasets of OSS peer review repositories (KH, RGK, NY, AECC, KF, HI), pp. 49–52.
MSR-2013-JanjicHSA #research #reuse #source code- An unabridged source code dataset for research in software reuse (WJ, OH, MS, CA), pp. 339–342.
MSR-2013-LamkanfiPD #debugging #eclipse #fault #mining- The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information (AL, JP, SD), pp. 203–206.
MSR-2013-MacLeanK #commit #network #social- Apache commits: social network dataset (ACM, CDK), pp. 135–138.
MSR-2013-RaemaekersDV #dependence #metric #repository- The maven repository dataset of metrics, changes, and dependencies (SR, AvD, JV), pp. 221–224.
MSR-2013-Squire - Project roles in the apache software foundation: a dataset (MS), pp. 301–304.
MSR-2013-Squire13a #twitter- Apache-affiliated twitter screen names: a dataset (MS), pp. 305–308.
MSR-2013-VasilescuSM #re-engineering- A historical dataset of software engineering conferences (BV, AS, TM), pp. 373–376.
MSR-2013-WagstromJS #graph #network #ruby- A network of rails: a graph dataset of ruby on rails and associated projects (PW, CJ, AS), pp. 229–232.
CSCW-2013-RostBCB #challenge #communication #representation #scalability #social #social media- Representation and communication: challenges in interpreting large social media datasets (MR, LB, HC, BB), pp. 357–362.
CIKM-2013-GilpinQD #clustering #performance #scalability- Efficient hierarchical clustering of large high dimensional datasets (SG, BQ, ID), pp. 1371–1380.
MLDM-2013-AllahSG #algorithm #array #mining #performance #scalability- An Efficient and Scalable Algorithm for Mining Maximal — High Confidence Rules from Microarray Dataset (WZAA, YKES, FFMG), pp. 352–366.
MLDM-2013-ParraL #clustering #using- Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering (FLP, EL), pp. 130–143.
RecSys-2013-Aiolli #performance #recommendation #scalability- Efficient top-n recommendation for very large scale binary rated datasets (FA), pp. 273–280.
SAC-2013-HapfelmeierSK #incremental #linear #performance- Incremental linear model trees on massive datasets: keep it simple, keep it fast (AH, JS, SK), pp. 129–135.
DATE-2013-StergiouJ #optimisation- Optimizing BDDs for time-series dataset manipulation (SS, JJ), pp. 1018–1021.
HPDC-2013-SuAWMWA #distributed #using- Taming massive distributed datasets: data sampling using bitmap indices (YS, GA, JW, KM, JW, JPA), pp. 13–24.
HPDC-2013-YinLBGN #order #performance #pipes and filters #using- Efficient analytics on ordered datasets using MapReduce (JY, YL, MB, LG, AN), pp. 125–126.
DRR-2012-WalkerLR #documentation #image- A synthetic document image dataset for developing and evaluating historical document processing methods (DDW, WBL, EKR).
TPDL-2012-BolandREM #identification- Identifying References to Datasets in Publications (KB, DR, KE, BM), pp. 150–161.
VLDB-2012-Shirani-MehrKS #evaluation #performance #query #reachability #scalability- Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets (HSM, FBK, CS), pp. 848–859.
CHI-2012-FisherPDs #incremental #performance #scalability #trust #visualisation- Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster (DF, IOP, SMD, MMCS), pp. 1673–1682.
ICPR-2012-BoomHHF #clustering #image #using- Supporting ground-truth annotation of image datasets using clustering (BJB, PXH, JH, RBF), pp. 1542–1545.
ICPR-2012-ChenWY #clustering #graph- Centroid-based clustering for graph datasets (LC, SW, XY), pp. 2144–2147.
ICPR-2012-FausserS #clustering #kernel #scalability- Clustering large datasets with kernel methods (SF, FS), pp. 501–504.
ICPR-2012-MogelmoseTM #comparative #detection #evaluation #learning- Learning to detect traffic signs: Comparative evaluation of synthetic and real-world datasets (AM, MMT, TBM), pp. 3452–3455.
ICPR-2012-NafchiK #image #representation- Rectangular based binary image representation: Theory, applications, and dataset introduction (HZN, HRK), pp. 190–193.
ICPR-2012-TakalaP #identification #named #network #people- CMV100: A dataset for people tracking and re-identification in sparse camera networks (VT, MP), pp. 1387–1390.
ICPR-2012-TanLZ - The dataset system of Economic Dispute handwritten (DSEDH) based on stroke shape and structure features (JT, JHL, XXZ), pp. 661–664.
ICPR-2012-Utasi #classification- Weighted conditional mutual information based boosting for classification of imbalanced datasets (ÁU), pp. 2711–2714.
KDD-2012-ShiA #mobile #recommendation- GetJar mobile application recommendations with very sparse datasets (KS, KA), pp. 204–212.
KDIR-2012-KharbatBO #algorithm #case study- A New Compaction Algorithm for LCS Rules — Breast Cancer Dataset Case Study (FK, LB, MO), pp. 382–385.
KDIR-2012-Vanetik #classification- Classification of Datasets with Frequent Itemsets is Wild (NV), pp. 386–389.
MLDM-2012-JoutsijokiJ #case study- DAGSVM vs. DAGKNN: An Experimental Case Study with Benthic Macroinvertebrate Dataset (HJ, MJ), pp. 439–453.
SIGIR-2012-HuO #classification #using- Genre classification for million song dataset using confidence-based classifiers combination (YH, MO), pp. 1083–1084.
SAC-2012-GomiI #3d #image #mobile #multi #named- MINI: a 3D mobile image browser with multi-dimensional datasets (AG, TI), pp. 989–996.
HPDC-2012-HefeedaGA #approximate #clustering #distributed #scalability- Distributed approximate spectral clustering for large-scale datasets (MH, FG, WAA), pp. 223–234.
ICDAR-2011-AlaeiNP #benchmark #documentation #metric #segmentation- A Benchmark Kannada Handwritten Document Dataset and Its Segmentation (AA, PN, UP), pp. 141–145.
ICDAR-2011-QuiniouMSVMPM #named- HAMEX — A Handwritten and Audio Dataset of Mathematical Expressions (SQ, HM, SPS, CVG, EM, SP, SM), pp. 452–456.
SIGMOD-2011-DuanKSU #benchmark #comparison #metric #rdf- Apples and oranges: a comparison of RDF benchmarks and real RDF datasets (SD, AK, KS, OU), pp. 145–156.
SIGMOD-2011-KashyapP #agile #development #interface #query #xml- Rapid development of web-based query interfacesfor XML datasets with QURSED (AK, MP), pp. 1339–1342.
VLDB-2012-BarskyKWH11 #correlation #mining #scalability #taxonomy- Mining Flipping Correlations from Large Datasets with Taxonomies (MB, SK, TW, JH), pp. 370–381.
OCSC-2011-AliprandiRMTM #rdf #semantics #web #wiki- Extracting Events from Wikipedia as RDF Triples Linked to Widespread Semantic Web Datasets (CA, FR, AM, MT, SM), pp. 90–99.
CIKM-2011-CachedaCFF #algorithm #analysis #nearest neighbour- Improving k-nearest neighbors algorithms: practical application of dataset analysis (FC, VC, DF, VF), pp. 2253–2256.
CIKM-2011-LiuYS #query- Subject-oriented top-k hot region queries in spatial dataset (JL, GY, HS), pp. 2409–2412.
CIKM-2011-SelvarajBSS #classification- Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset (SKS, BB, SS, SKS), pp. 653–662.
KDD-2011-CordeiroTTLKF #clustering #multi #pipes and filters #scalability- Clustering very large multi-dimensional datasets with MapReduce (RLFC, CTJ, AJMT, JL, UK, CF), pp. 690–698.
SIGIR-2011-LeeHWHS #graph #image #learning #multi #pipes and filters #scalability #using- Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce (WYL, LCH, GLW, WHH, YFS), pp. 1121–1122.
SIGMOD-2010-WangWLWWLTXL #detection #named- MapDupReducer: detecting near duplicates over massive datasets (CW, JW, XL, WW, HW, HL, WT, JX, RL), pp. 1119–1122.
VLDB-2010-LiD #probability #ranking- Ranking Continuous Probabilistic Datasets (JL, AD), pp. 638–649.
VLDB-2010-Matsudaira #3d #biology #scalability- High-End Biological Imaging Generates Very Large 3D+ and Dynamic Datasets (PM), p. 3.
VLDB-2010-MelnikGLRSTV #analysis #interactive #named- Dremel: Interactive Analysis of Web-Scale Datasets (SM, AG, JJL, GR, SS, MT, TV), pp. 330–339.
VLDB-2010-ParameswaranGR #concept #scalability #towards #web- Towards The Web of Concepts: Extracting Concepts from Large Datasets (AGP, HGM, AR), pp. 566–577.
MSR-2010-BachmannB #correlation #debugging #process #quality #re-engineering- When process data quality affects the number of bugs: Correlations in software engineering datasets (AB, AB), pp. 62–71.
WCRE-2010-NguyenAH #bias #case study #debugging- A Case Study of Bias in Bug-Fix Datasets (THDN, BA, AEH), pp. 259–268.
CIKM-2010-CormodeKW #algorithm #scalability #set- Set cover algorithms for very large datasets (GC, HJK, AW), pp. 479–488.
CIKM-2010-HarpaleYGHY #multi #named #performance #personalisation- CiteData: a new multi-faceted dataset for evaluating personalized search performance (AH, YY, SG, DH, ZY), pp. 549–558.
ICML-2010-SyedR #identification- Unsupervised Risk Stratification in Clinical Datasets: Identifying Patients at Risk of Rare Outcomes (ZS, IR), pp. 1023–1030.
ICML-2010-TanWT #feature model #learning- Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets (MT, LW, IWT), pp. 1047–1054.
ICPR-2010-AlvarezSVO - Perceptual Color Texture Codebooks for Retrieving in Highly Diverse Texture Datasets (SÁ, AS, MV, XO), pp. 866–869.
ICPR-2010-KimM #classification- Dense Structure Inference for Object Classification in Aerial LIDAR Dataset (EK, GGM), pp. 3049–3052.
ICPR-2010-PapaCF #classification #optimisation- Optimizing Optimum-Path Forest Classification for Huge Datasets (JPP, FAMC, AXF), pp. 4162–4165.
ICPR-2010-SodaI #composition #integration #learning- Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration (PS, GI), pp. 3117–3120.
ICPR-2010-StottingerZKH #evaluation- FeEval A Dataset for Evaluation of Spatio-temporal Local Features (JS, SZ, RK, AH), pp. 499–502.
RecSys-2010-PilaszyZT #feedback #matrix #performance- Fast als-based matrix factorization for explicit and implicit feedback datasets (IP, DZ, DT), pp. 71–78.
PLDI-2010-ChenHEFPTW #optimisation- Evaluating iterative optimization across 1000 datasets (YC, YH, LE, GF, LP, OT, CW), pp. 448–459.
SAC-2010-BaralisCC #mining #persistent #scalability- A persistent HY-Tree to efficiently support itemset mining on large datasets (EB, TC, SC), pp. 1060–1064.
SAC-2010-LuccheseOP #generative #mining- A generative pattern model for mining binary datasets (CL, SO, RP), pp. 1109–1110.
STOC-2010-BravermanO #independence- Measuring independence of datasets (VB, RO), pp. 271–280.
ICDAR-2009-AntonacopoulosBPP #analysis #documentation #evaluation #layout #performance- A Realistic Dataset for Performance Evaluation of Document Layout Analysis (AA, DB, CP, SP), pp. 296–300.
CHI-2009-LeeSRCT #named #roadmap- FacetLens: exposing trends and relationships to support sensemaking within faceted datasets (BL, GS, GGR, MC, DST), pp. 1293–1302.
CIKM-2009-BalachandranPK #clustering #configuration management #documentation- Interpretable and reconfigurable clustering of document datasets by deriving word-based rules (VB, DP, DK), pp. 1773–1776.
CIKM-2009-Muntes-MuleroN #privacy #scalability- Privacy and anonymization for very large datasets (VMM, JN), pp. 2117–2118.
CIKM-2009-StoyanovichA #clustering- Rank-aware clustering of structured datasets (JS, SAY), pp. 1429–1432.
KDD-2009-DundarHBRR #case study #detection #learning #using- Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology (MD, EDH, AKB, JPR, BR), pp. 279–288.
MLDM-2009-CelepcikayEO #using- Regional Pattern Discovery in Geo-referenced Datasets Using PCA (OUC, CFE, CO), pp. 719–733.
MLDM-2009-SegataB #performance #scalability- Fast Local Support Vector Machines for Large Datasets (NS, EB), pp. 295–310.
ESEC-FSE-2009-BirdBADBFD #bias #debugging- Fair and balanced?: bias in bug-fix datasets (CB, AB, EA, JD, AB, VF, PTD), pp. 121–130.
SAC-2009-OwensMR #mining- Capturing truthiness: mining truth tables in binary datasets (CCOI, TMM, NR), pp. 1467–1474.
ECDL-2008-BindingMT #semantics- Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM (CB, KM, DT), pp. 280–290.
SIGMOD-2008-PanZW #clustering #composition #matrix #named #performance #scalability- CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition (FP, XZ, WW), pp. 173–184.
EDM-2008-VenturaRH #education #evaluation #framework #metric- Analyzing Rule Evaluation Measures with Educational Datasets: A Framework to Help the Teacher (SV, CR, CH), pp. 177–181.
ICEIS-DISI-2008-MuldnerML #data access #policy #xml- Succinct Access Control Policies for Published XML Datasets (TM, JKM, GL), pp. 380–385.
CIKM-2008-ChoudharyMB #evolution #on the- On quantifying changes in temporally evolving dataset (RC, SM, AB), pp. 1459–1460.
CIKM-2008-LiuLNBMG #feature model #performance #preprocessor #realtime #scalability- Real-time data pre-processing technique for efficient feature extraction in large scale datasets (YL, LVL, RSN, KB, PM, CLG), pp. 981–990.
CIKM-2008-NguyenS #analysis #correlation #performance- Fast correlation analysis on time series datasets (PN, NS), pp. 787–796.
CIKM-2008-ShawXG #approximate- Deriving non-redundant approximate association rules from hierarchical datasets (GS, YX, SG), pp. 1451–1452.
ICML-2008-WolfeHK #distributed #scalability- Fully distributed EM for very large datasets (JW, AH, DK), pp. 1184–1191.
ICPR-2008-WatanabeK08a #scalability- RANSAC-SVM for large-scale datasets (KW, TK), pp. 1–4.
KDD-2008-DasSN #category theory #detection- Anomaly pattern detection in categorical datasets (KD, JGS, DBN), pp. 169–176.
FSE-2008-OsterweilCEPWBH #experience #process #using #workflow- Experience in using a process language to define scientific workflow and generate dataset provenance (LJO, LAC, AME, RMP, AEW, ERB, JLH), pp. 319–329.
SIGMOD-2007-XiaoT #named #privacy #towards- M-invariance: towards privacy preserving re-publication of dynamic datasets (XX, YT), pp. 689–700.
HIMI-IIE-2007-CollinsNMRBCPFHP #collaboration- The Karst Collaborative Workspace for Analyzing and Annotating Scientific Datasets (LMC, DEN, MLBM, JVR, MAB, CRC, JEP, BFS, SKH, JCP), pp. 3–12.
KDD-2007-DasS #category theory #detection- Detecting anomalous records in categorical datasets (KD, JGS), pp. 220–229.
PADL-2007-Costa #performance #prolog- Prolog Performance on Larger Datasets (VSC), pp. 185–199.
DRR-2006-ZhangA #towards- Toward quantifying the amount of style in a dataset (XZ, SA).
SIGMOD-2006-KiferG #injection- Injecting utility into anonymized datasets (DK, JG), pp. 217–228.
VLDB-2006-GemullaLH #evolution #maintenance- A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets (RG, WL, PJH), pp. 595–606.
VLDB-2006-JiTT #3d #mining- Mining Frequent Closed Cubes in 3D Datasets (LJ, KLT, AKHT), pp. 811–822.
ICPR-v1-2006-AnC - Finding Rule Groups to Classify High Dimensional Gene Expression Datasets (JA, YPPC), pp. 1196–1199.
ICPR-v3-2006-FelsbergG #multi #named #robust #scalability- P-Channels: Robust Multivariate M-Estimation of Large Datasets (MF, GHG), pp. 262–267.
ICPR-v4-2006-Jun #comparison #detection- A Peer Dataset Comparison Outlier Detection Model Applied to Financial Surveillance (TJ), pp. 900–903.
SAC-2006-FangG #bound- Boundary surface extraction and rendering for volume datasets (SF, PG), pp. 1356–1360.
SAC-2006-NemalhabibS #algorithm #category theory #clustering #named- CLUC: a natural clustering algorithm for categorical datasets based on cohesion (AN, NS), pp. 637–638.
ICDAR-2005-AgrawalBMV #named #online #representation #xml- UPX: A New XML Representation for Annotated Datasets of Online Handwriting Data (MA, KB, SM, LV), pp. 1161–1165.
ICEIS-v2-2005-DoP #mining #scalability #visualisation- Mining Very Large Datasets with SVM and Visualization (TND, FP), pp. 127–141.
KDD-2005-JinWPPA #graph- Discovering frequent topological structures from graph datasets (RJ, CW, DP, SP, GA), pp. 606–611.
KDD-2005-ZakiPAS #algorithm #category theory #clustering #effectiveness #mining #named- CLICKS: an effective algorithm for mining subspace clusters in categorical datasets (MJZ, MP, IA, TS), pp. 736–742.
MLDM-2005-FerrandizB05a #evaluation- Supervised Evaluation of Dataset Partitions: Advantages and Practice (SF, MB), pp. 600–609.
MLDM-2005-SiaL #clustering #scalability #using- Clustering Large Dynamic Datasets Using Exemplar Points (WS, MML), pp. 163–173.
SIGMOD-2004-CongXPTY #array #named- FARMER: Finding Interesting Rule Groups in Microarray Datasets (GC, AKHT, XX, FP, JY), pp. 143–154.
VLDB-2004-HoweM #algebra- Algebraic Manipulation of Scientific Datasets (BH, DM), pp. 924–935.
CIKM-2004-ChenL #clustering #named #scalability #visualisation- ClusterMap: labeling clusters in large datasets via visualization (KC, LL), pp. 285–293.
CIKM-2004-ChungJM #clustering #mining #using- Mining gene expression datasets using density-based clustering (SC, JJ, DM), pp. 150–151.
ICPR-v4-2004-WillisC #3d #multi #symmetry- Alignment of Multiple Non-Overlapping Axially Symmetric 3D Datasets (ARW, DBC), pp. 96–99.
KDD-2004-TruongLB #learning #random #using- Learning a complex metabolomic dataset using random forests and support vector machines (YT, XL, CB), pp. 835–840.
SIGIR-2004-DavidovGM #categorisation #generative- Parameterized generation of labeled datasets for text categorization based on a hierarchical directory (DD, EG, SM), pp. 250–257.
SAC-2004-AdamJA #detection- Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets (NRA, VPJ, VA), pp. 576–583.
SAC-2004-CarswellGN #multi #semantics #transaction- Wireless spatio-semantic transactions on multimedia datasets (JDC, KG, MN), pp. 1201–1205.
DRR-2003-HauserSSDST - Correcting OCR text by association with historical datasets (SEH, JS, TFS, DDF, SS, GRT), pp. 84–93.
VLDB-2003-LinLYZ #multi #scalability- Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets (XL, QL, YY, XZ), pp. 814–825.
ICEIS-v2-2003-DoP #algorithm #mining #scalability- Mining Very Large Datasets with Support Vector Machine Algorithms (TND, FP), pp. 140–147.
ICML-2003-LeskovecS #linear #programming- Linear Programming Boosting for Uneven Datasets (JL, JST), pp. 456–463.
ICML-2003-ZhuWC #scalability- Eliminating Class Noise in Large Datasets (XZ, XW, QC), pp. 920–927.
KDD-2003-El-HajjZ #interactive #matrix #mining #performance #scalability- Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining (MEH, ORZ), pp. 109–118.
KDD-2003-KoyuturkG #framework #named- PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets (MK, AG), pp. 147–156.
KDD-2003-PanCTYZ #biology #named- Carpenter: finding closed patterns in long biological datasets (FP, GC, AKHT, JY, MJZ), pp. 637–642.
KDD-2003-PeterCG #algorithm #clustering #scalability- New unsupervised clustering algorithm for large datasets (WP, JC, CG), pp. 643–648.
MLDM-2003-LeleuRBE #mining #named- GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions (ML, CR, JFB, GE), pp. 293–306.
SAC-2003-ChenL #clustering #visualisation- Cluster Rendering of Skewed Datasets via Visualization (KC, LL), pp. 909–916.
CIKM-2002-ZhaoK #algorithm #clustering #documentation #evaluation- Evaluation of hierarchical clustering algorithms for document datasets (YZ, GK), pp. 515–524.
KDD-2002-RidgewayM #analysis- Bayesian analysis of massive datasets via particle filters (GR, DM), pp. 5–13.
KDD-2002-TantrumMS #clustering #modelling #scalability- Hierarchical model-based clustering of large datasets through fractionation and refractionation (JT, AM, WS), pp. 183–190.
ICEIS-v1-2001-KotsisWFM #multi #novel #visualisation- Novel Data Visualisation and Exploration in Multidimensional Datasets (NK, GRSW, JDF, DRM), pp. 170–175.
ICML-2001-DomeniconiG #approach #approximate #classification #multi #nearest neighbour #performance #query #scalability- An Efficient Approach for Approximating Multi-dimensional Range Queries and Nearest Neighbor Classification in Large Datasets (CD, DG), pp. 98–105.
KDD-2001-BeygelzimerPM #category theory #performance #scalability #visualisation- Fast ordering of large categorical datasets for better visualization (AB, CSP, SM), pp. 239–244.
KDD-2000-BarbaraC #clustering #using- Using the fractal dimension to cluster datasets (DB, PC), pp. 260–264.
KDD-2000-Yang #3d #interactive #relational #scalability- Interactive exploration of very large relational datasets through 3D dynamic projections (LY), pp. 236–243.
KDD-2000-ZhangDR #constraints #scalability- Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets (XZ, GD, KR), pp. 310–314.
SIGMOD-1999-MankuRL #online #order #performance #random #scalability #statistics- Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets (GSM, SR, BGL), pp. 251–262.
KDD-1999-DaviesM #network- Bayesian Networks for Lossless Dataset Compression (SD, AWM), pp. 387–391.
VLDB-1998-GehrkeRG #framework #named #performance #scalability- RainForest — A Framework for Fast Decision Tree Construction of Large Datasets (JG, RR, VG), pp. 416–427.
VLDB-1998-KnorrN #algorithm #mining #scalability- Algorithms for Mining Distance-Based Outliers in Large Datasets (EMK, RTN), pp. 392–403.
VLDB-1998-ShuklaDN #multi- Materialized View Selection for Multidimensional Datasets (AS, PD, JFN), pp. 488–499.
KDD-1998-AlsabtiRS #classification #named #scalability- CLOUDS: A Decision Tree Classifier for Large Datasets (KA, SR, VS), pp. 2–8.
KDD-1998-OatesJ #modelling #scalability- Large Datasets Lead to Overly Complex Models: An Explanation and a Solution (TO, DJ), pp. 294–298.
SIGMOD-1997-KornJF #ad hoc #query #scalability #sequence- Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences (FK, HVJ, CF), pp. 289–300.
SIGMOD-1997-LivnyRBCDLMW #named #query #scalability #visualisation- DEVise: Integrated Querying and Visualization of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 301–312.
SIGMOD-1997-LivnyRBCDLMW97a #named #query #scalability #visual notation- DEVise: Integrated Querying and Visual Exploration of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 517–520.
KDD-1997-ZupanBBC #approach #composition #data mining #mining- A Dataset Decomposition Approach to Data Mining and Machine Discovery (BZ, MB, IB, BC), pp. 299–302.
SIGMOD-1995-FaloutsosL #algorithm #multi #named #performance #visualisation- FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets (CF, KIL), pp. 163–174.
KDD-1995-StolorzNMMSSYNCMF #data mining #mining #performance #scalability- Fast Spatio-Temporal Data Mining of Large Geophysical Datasets (PES, HN, EM, RRM, ECS, JRS, JY, KWN, SYC, CRM, JDF), pp. 300–305.