Tag #dataset
294 papers:
- EDM-2019-JensenHD #detection #modelling #student
- Generalizability of Sensor-Free Affect Detection Models in a Longitudinal Dataset of Tens of Thousands of Students (EJ, SH, SKD).
- ICSME-2019-LevinY #scalability #source code
- Processing Large Datasets of Fined Grained Source Code Changes (SL, AY), pp. 382–385.
- ICSME-2019-NewmanDAPKH19a #open data
- An Open Dataset of Abbreviations and Expansions (CDN, MJD, RSA, AP, DK, EH0), p. 280.
- MSR-2019-AhluwaliaFP #fault #named #predict
- Snoring: a noise in defect prediction datasets (AA, DF, MDP), pp. 63–67.
- MSR-2019-BiswasIHR #python
- Boa meets python: a boa dataset of data science software in python language (SB, MJI, YH, HR), pp. 577–581.
- MSR-2019-JoshiC #agile #git #named
- RapidRelease: a dataset of projects and issues on github with rapid releases (SDJ, SC), pp. 587–591.
- MSR-2019-PietriSZ #development #graph
- The software heritage graph dataset: public software development under one roof (AP, DS, SZ), pp. 138–142.
- MSR-2019-PontaPSBD #open source
- A manually-curated dataset of fixes to vulnerabilities of open-source software (SEP, HP, AS, MB, CD), pp. 383–387.
- MSR-2019-RaduN #debugging #non-functional
- A dataset of non-functional bugs (AR, SN), pp. 399–403.
- MSR-2019-WangSL0 #android #metadata #named #reliability #towards
- RmvDroid: towards a reliable Android malware dataset with app metadata (HW, JS, HL, YG0), pp. 404–408.
- MSR-2019-WickertREDM #encryption #parametricity
- A dataset of parametric cryptographic misuses (AKW, MR, ME, AD, MM), pp. 96–100.
- CIKM-2019-ChenMLZM #named #scalability #web
- TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions (JC, JM, YL, MZ0, SM), pp. 2485–2488.
- CIKM-2019-ChenWCKQ #generative #query #towards
- Towards More Usable Dataset Search: From Query Characterization to Snippet Generation (JC, XW, GC0, EK, YQ), pp. 2445–2448.
- CIKM-2019-Cohen-ShapiraRS #named #recommendation #representation #visual notation
- AutoGRD: Model Recommendation Through Graphical Dataset Representation (NCS, LR, BS, GK, RV), pp. 821–830.
- CIKM-2019-SunAJHS #flexibility #named
- MithraLabel: Flexible Dataset Nutritional Labels for Responsible Data Science (CS, AA, HVJ, BH, JS), pp. 2893–2896.
- ECIR-p1-2019-LinjordetB #modelling
- Impact of Training Dataset Size on Neural Answer Selection Models (TL, KB), pp. 828–835.
- ECIR-p2-2019-Dosso #keyword #rdf
- Keyword Search on RDF Datasets (DD), pp. 332–336.
- ICML-2019-CornishVBDD #scalability
- Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets (RC, PV, ABC, GD, AD), pp. 1351–1360.
- ICML-2019-GhadikolaeiGFS #big data #learning
- Learning and Data Selection in Big Datasets (HSG, HGG, CF, MS), pp. 2191–2200.
- ESEC-FSE-2019-MiryeganehAH #approach #automation #integration #towards
- An IR-based approach towards automated integration of geo-spatial datasets in map-based software systems (NM, MA, HH), pp. 946–954.
- ICSE-2019-DmeiriTWBLDVR #mining #named
- BugSwarm: mining and continuously growing a dataset of reproducible failures and fixes (DAT, ND, YW, AB, YCL, PTD, BV, CRG), pp. 339–349.
- ICTSS-2019-NakajimaC #generative #machine learning #source code #testing
- Generating Biased Dataset for Metamorphic Testing of Machine Learning Programs (SN0, TYC), pp. 56–64.
- ICSME-2018-0008ZOPLB #analysis #re-engineering #sentiment
- Two Datasets for Sentiment Analysis in Software Engineering (BL0, FZ, RO, MDP, ML, GB), p. 712.
- MSR-2018-GaoYJLYZ #concurrent #named #testing
- Jbench: a dataset of data races for concurrency testing (JG, XY, YJ0, HL0, WY, XZ), pp. 6–9.
- MSR-2018-GeigerMPPNB #android #commit #graph
- A graph-based dataset of commit history of real-world Android apps (FXG, IM, LP, FP, DDN, AB), pp. 30–33.
- MSR-2018-GkortzisMS #named #open source #security
- VulinOSS: a dataset of security vulnerabilities in open-source systems (AG, DM, DS), pp. 18–21.
- MSR-2018-MarkovtsevL #git
- Public git archive: a big code dataset for all (VM, WL), pp. 34–37.
- MSR-2018-MartinsAL #java #named
- 50K-C: a dataset of compilable, and compiled, Java projects (PM0, RA, CVL), pp. 1–5.
- MSR-2018-ProkschAN #developer #empirical #process
- Enriched event streams: a general dataset for empirical studies on in-IDE activities of software developers (SP, SA, SN), pp. 62–65.
- MSR-2018-SahaLLYP #debugging #java #scalability
- Bugs.jar: a large-scale, diverse dataset of real-world Java bugs (RKS, YL, WL, HY, MRP), pp. 10–13.
- MSR-2018-XuZ #kernel #linux #multi
- A multi-level dataset of linux kernel patchwork (YX, MZ), pp. 54–57.
- MSR-2018-YuLYWW #git
- A dataset of duplicate pull-requests in github (YY0, ZL, GY, TW0, HW), pp. 22–25.
- SANER-2018-SobreiraDDMM #debugging #fault
- Dissection of a bug dataset: Anatomy of 395 patches from Defects4J (VS, TD, FM, MM, MdAM), pp. 130–140.
- CIG-2018-AungBDCKYW #learning #predict #scalability
- Predicting Skill Learning in a Large, Longitudinal MOBA Dataset (MA, VB, AD, PIC, AVK, CY, ARW), pp. 1–7.
- CIG-2018-VariaTKK #3d #analysis #game studies
- A Refined 3D Dataset for the Analysis of Player Actions in Exertion Games (CV, GT, KK, SDK), pp. 1–4.
- CIKM-2018-AfsharPPSHS #named #scalability
- COPA: Constrained PARAFAC2 for Sparse & Large Datasets (AA, IP, EEP, ES, JCH, JS), pp. 793–802.
- CIKM-2018-HoangVN #benchmark #detection #metric #named #topic
- W2E: A Worldwide-Event Benchmark Dataset for Topic Detection and Tracking (TAH, KDV, WN), pp. 1847–1850.
- CIKM-2018-LevchenkoYAMKS #distributed #named #sketching
- Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets (OL, DEY, RA, FM, BK, DES), pp. 1951–1954.
- ECIR-2018-DurRF #architecture #benchmark #challenge #lessons learnt #metric
- Reproducing a Neural Question Answering Architecture Applied to the SQuAD Benchmark Dataset: Challenges and Lessons Learned (AD, AR, PF), pp. 102–113.
- ICML-2018-YoonJS18a #generative #modelling #multi #named #network #predict #using
- RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks (JY, JJ, MvdS), pp. 5685–5693.
- ICPR-2018-ChowdhuryAT0R #recognition
- MSU-AVIS dataset: Fusing Face and Voice Modalities for Biometric Recognition in Indoor Surveillance Videos (AC, YA, LT, XL0, AR), pp. 3567–3573.
- ICPR-2018-FerrariBB #identification
- Extended YouTube Faces: a Dataset for Heterogeneous Open-Set Face Identification (CF, SB, ADB), pp. 3408–3413.
- ICPR-2018-LiangLJXL #benchmark #metric #multi #named #predict
- SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction (LL, LL, LJ, DX, ML), pp. 1598–1603.
- ICPR-2018-MerchantSKM #image #using
- Appearance-based data augmentation for image datasets using contrast preserving sampling (AKM, TQS, BK, RM), pp. 1235–1240.
- ICPR-2018-TranLPHKTNP #analysis #multi
- A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality (THT, TLL, DTP, VNH, VMK, QTT, TSN, CP), pp. 1947–1952.
- ICPR-2018-TuggenerESPS #classification #detection #segmentation
- DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects (LT, IE, JS, MP, TS), pp. 3704–3709.
- KDD-2018-ChenCL #open data
- Rotation-blended CNNs on a New Open Dataset for Tropical Cyclone Image-to-intensity Regression (BC, BFC, HTL), pp. 90–99.
- KDD-2018-RijnH
- Hyperparameter Importance Across Datasets (JNvR, FH), pp. 2367–2376.
- JCDL-2017-DuretecR0 #benchmark #metric
- A Text Extraction Software Benchmark Based on a Synthesized Dataset (KD, AR, CB0), pp. 109–118.
- JCDL-2017-SinghNGBMG #behaviour #case study #reuse
- Citation Sentence Reuse Behavior of Scientists: A Case Study on Massive Bibliographic Text Dataset of Computer Science (MS0, AN, DG, NAB, AM0, PG), pp. 277–280.
- MSR-2017-AivaloglouHMR #source code
- A dataset of scratch programs: scraped, shaped and scored (EA, FH, JML, GR), pp. 511–514.
- MSR-2017-MadeyskiK #fault #idea #predict
- Continuous defect prediction: the idea and a related dataset (LM, MK), pp. 515–518.
- MSR-2017-OrellanaLMD #difference #integration #on the #testing
- On the differences between unit and integration testing in the travistorrent dataset (GO, GL, AM, SD), pp. 451–454.
- MSR-2017-RoblesHHCF #git #modelling #uml
- An extensive dataset of UML models in GitHub (GR, THQ, RH, MRVC, MAF), pp. 519–522.
- MSR-2017-SadatBM
- Rediscovery datasets: connecting duplicate reports (MS, ABB, AVM), pp. 527–530.
- MSR-2017-ZhuLRC #semantics #version control
- A dataset for dynamic discovery of semantic changes in version controlled software histories (CZ, YL0, JR, MC), pp. 523–526.
- AIIDE-2017-LinGKS #named #research
- STARDATA: A StarCraft AI Research Dataset (ZL, JG, VK, GS), pp. 50–56.
- ECIR-2017-BhattacharjeeA #algorithm #clustering #incremental #nearest neighbour
- Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets (PB, AA), pp. 568–574.
- ICML-2017-ValeraG #automation #statistics
- Automatic Discovery of the Statistical Types of Variables in a Dataset (IV, ZG), pp. 3521–3529.
- ICML-2017-ZhouZIJWS #multi #testing
- When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, l₂-consistency and Neuroscience Applications (HHZ, YZ, VKI, SCJ, GW, VS), pp. 4170–4179.
- MSR-2016-CosentinoIC #git
- Findings from GitHub: methods, datasets and limitations (VC, JLCI, JC), pp. 137–141.
- MSR-2016-ProkschANM #c# #syntax
- A dataset of simplified syntax trees for C# (SP, SA, SN, MM), pp. 476–479.
- MSR-2016-YangKYI #code review #mining #overview #people #process #repository
- Mining the modern code review repositories: a dataset of people, process and product (XY, RGK, NY, HI), pp. 460–463.
- MSR-2016-ZhuZM #issue tracking #multi
- Multi-extract and multi-level dataset of mozilla issue tracking history (JZ, MZ, HM), pp. 472–475.
- SANER-2016-KadarHFG #assessment #maintenance #refactoring
- A Code Refactoring Dataset and Its Assessment Regarding Software Maintainability (IK, PH, RF, TG), pp. 599–603.
- CIKM-2016-BleifussBFRW0PN #approximate #dependence #functional #scalability
- Approximate Discovery of Functional Dependencies for Large Datasets (TB, SB, JF, JR, GW, SK0, TP, FN), pp. 1803–1812.
- CIKM-2016-CaoY #benchmark #metric #modelling #named #network #platform #social
- ASNets: A Benchmark Dataset of Aligned Social Networks for Cross-Platform User Modeling (XC, YY0), pp. 1881–1884.
- CIKM-2016-NguyenTTN #named #social #summary
- SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization (MTN, CXT, DVT, MLN), pp. 2409–2412.
- ECIR-2016-BotevaGSR #information retrieval #learning #rank
- A Full-Text Learning to Rank Dataset for Medical Information Retrieval (VB, DGG, AS, SR), pp. 716–722.
- ICPR-2016-MoazzenT #approximate #clustering
- Sampling based approximate spectral clustering ensemble for partitioning datasets (YM, KT), pp. 1630–1635.
- ICPR-2016-MurguiaRA #adaptation #architecture #evaluation #modelling #network #parallel
- Evaluation of the background modeling method Auto-Adaptive Parallel Neural Network Architecture in the SBMnet dataset (MICM, JARQ, GRA), pp. 137–142.
- ICPR-2016-OrtegoSM #estimation #multi #re-engineering
- Rejection based multipath reconstruction for background estimation in SBMnet 2016 dataset (DO, JCS, JMM0), pp. 114–119.
- ICPR-2016-YuenMT #algorithm #evaluation #on the
- On looking at faces in an automobile: Issues, algorithms and evaluation on naturalistic driving dataset (KY, SM, MMT), pp. 2777–2782.
- KDD-2016-GaoP #named #relational
- Squish: Near-Optimal Compression for Archival of Relational Datasets (YG, AGP), pp. 1575–1584.
- KDD-2016-MaiAS #algorithm #clustering #named #performance #scalability
- AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets (STM, IA, MS), pp. 1025–1034.
- DRR-2015-ChenSWLHI #analysis #documentation #layout
- Ground truth model, tool, and dataset for layout analysis of historical documents (KC, MS, HW, ML, JH, RI), p. 940204.
- HT-2015-RoutB #algorithm #ranking #twitter
- A Human-annotated Dataset for Evaluating Tweet Ranking Algorithms (DPR, KB), pp. 95–99.
- PODS-2015-Cormode #scalability #summary
- Compact Summaries over Large Datasets (GC), pp. 157–158.
- TPDL-2015-LlewellynGAOT #topic #twitter
- Extracting a Topic Specific Dataset from a Twitter Archive (CL, CG, BA, JO, RT), pp. 364–367.
- VLDB-2015-BhattacherjeeCH #trade-off #version control
- Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff (SB, AC, SH, AD, AGP), pp. 1346–1357.
- VLDB-2015-HarbiAKM #query #rdf
- Evaluating SPARQL Queries on Massive RDF Datasets (RH, IA, PK, NM), pp. 1848–1859.
- EDM-2015-VossSMS #approach #learning #matrix
- A Transfer Learning Approach for Applying Matrix Factorization to Small ITS Datasets (LV, CS, CM, LST), pp. 372–375.
- ICSME-2015-JansenH #industrial #smell #spreadsheet
- Code smells in spreadsheet formulas revisited on an industrial dataset (BJ, FH), pp. 372–380.
- MSR-2015-AltingerSDW #embedded #fault #industrial #modelling #novel #predict
- A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software (HA, SS, YD, FW), pp. 494–497.
- MSR-2015-GermanAH #git #linux #process
- A Dataset of the Activity of the Git Super-repository of Linux in 2012 (DMG, BA, AEH), pp. 470–473.
- MSR-2015-HabayebMMBB #fault
- The Firefox Temporal Defect Dataset (MH, AVM, SSM, LB, AB), pp. 498–501.
- MSR-2015-KrutzMMRPFS #android #open source
- A Dataset of Open-Source Android Applications (DEK, MM, SAM, AR, JP, AF, JS), pp. 522–525.
- MSR-2015-MauczkaBSG #commit #developer
- Dataset of Developer-Labeled Commit Messages (AM, FB, CS, TG), pp. 490–493.
- MSR-2015-OhiraKYYMLFHIM #classification #debugging
- A Dataset of High Impact Bugs: Manually-Classified Issue Reports (MO, YK, YY, HY, YM, NL, KF, HH, AI, KiM), pp. 518–521.
- MSR-2015-PalombaNTBOPL #evaluation #named #open data #smell
- Landfill: An Open Dataset of Code Smells with Public Evaluation (FP, DDN, MT, GB, RO, DP, ADL), pp. 482–485.
- MSR-2015-SawantB #api
- A Dataset for API Usage (AAS, AB), pp. 506–509.
- MSR-2015-WermelingerY #architecture #evolution
- An Architectural Evolution Dataset (MW, YY), pp. 502–505.
- MSR-2015-Zacchiroli #metadata #source code
- The Debsources Dataset: Two Decades of Debian Source Code Metadata (SZ), pp. 466–469.
- SCAM-2015-AivaloglouHH #scalability #spreadsheet
- A grammar for spreadsheet formulas evaluated on two large datasets (EA, DH, FH), pp. 121–130.
- VS-Games-2015-ChalasFFSK #3d #generative
- Generation of Variable Human Faces from 3D Scan Dataset (IC, ZF, KF, JS, BK), pp. 1–8.
- CSCW-2015-QuattroneCM #bias
- There’s No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets (GQ, LC, PDM), pp. 1021–1032.
- ICEIS-v2-2015-SarinhoLS #linked data #open data #question
- Can You Find All the Data You Expect in a Linked Dataset? (WTS, BFL, DS), pp. 648–655.
- CIKM-2015-LiXJL #adaptation #approach
- Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach (HL, LX0, XJ, JL), pp. 1001–1010.
- CIKM-2015-SinghPK0MG #case study #predict
- The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset (MS0, VP, SK, TC0, AM0, PG), pp. 1271–1280.
- ICML-2015-BarbosaENW #distributed #power of
- The Power of Randomization: Distributed Submodular Maximization on Massive Datasets (RdPB, AE, HLN, JW), pp. 1236–1244.
- ICML-2015-MaLF #analysis #canonical #correlation #linear #scalability
- Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis (ZM, YL, DPF), pp. 169–178.
- KDD-2015-CaoWYR #online #scalability
- Online Outlier Exploration Over Large Datasets (LC, MW, DY, EAR), pp. 89–98.
- RecSys-2015-Ben-ShimonTFSRH #challenge
- RecSys Challenge 2015 and the YOOCHOOSE Dataset (DBS, AT, MF, BS, LR, JH), pp. 357–358.
- SIGIR-2015-MorenoD #adaptation #metric #semistructured data
- Adapted B-CUBED Metrics to Unbalanced Datasets (JGM, GD), pp. 911–914.
- ASE-2015-NamK #fault #named #predict
- CLAMI: Defect Prediction on Unlabeled Datasets (T) (JN, SK), pp. 452–463.
- ICSE-v2-2015-HermansM #analysis #email #spreadsheet
- Enron’s Spreadsheets and Related Emails: A Dataset and Analysis (FH, ERMH), pp. 7–16.
- SAC-2015-RochaRCOMVADGF #algorithm #classification #documentation #named #performance #using
- G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
- DRR-2014-BrunoL #documentation #open data #recognition #research
- The Lehigh Steel Collection: a new open dataset for document recognition research (BB, DPL), p. ?–9.
- JCDL-2014-CastroSR #data transformation #lightweight #ontology #research #workflow
- Creating lightweight ontologies for dataset description practical applications in a cross-domain research data management workflow (JAC, JRdS, CR), pp. 313–316.
- JCDL-2014-HasanGFBM #framework #library
- Data mapping framework in a digital library with computational epidemiology datasets (SMSH, SG, EAF, KRB, MVM), pp. 449–450.
- JCDL-2014-LlewellynRBKSJ #information management
- Building a dataset of sensitive information (CL, LR, RB, SK, MS, RvJ), pp. 493–494.
- SIGMOD-2014-SatishSPSPHSYD #framework #graph #navigation #using
- Navigating the maze of graph analytics frameworks using massive graph datasets (NS, NS, MMAP, JS, JP, MAH, SS, ZY, PD), pp. 979–990.
- VLDB-2015-MozafariSFJM14 #learning #scalability
- Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning (BM, PS, MJF, MIJ, SM), pp. 125–136.
- EDM-2014-LiuMK #testing
- Interpreting model discovery and testing generalization to a new dataset (RL0, EAM, KRK), pp. 107–113.
- ICSME-2014-ThongtanunamYYKCFI #code review #named #overview #visualisation
- ReDA: A Web-Based Visualization Tool for Analyzing Modern Code Review Dataset (PT, XY, NY, RGK, AECC, KF, HI), pp. 605–608.
- MSR-2014-GousiosZ #development #research
- A dataset for pull-based development research (GG, AZ), pp. 368–371.
- MSR-2014-LazarRS14a #debugging #generative
- Generating duplicate bug datasets (AL, SR, BS), pp. 392–395.
- MSR-2014-MurakamiHK
- A dataset of clone references with gaps (HM, YH, SK), pp. 412–415.
- MSR-2014-PassosC #feature model #kernel #linux
- A dataset of feature additions and feature removals from the Linux kernel (LTP, KC), pp. 376–379.
- MSR-2014-RoblesRSVG #challenge #overview
- FLOSS 2013: a survey dataset about free software contributors: challenges for curating, sharing, and combining (GR, LAR, AS, BV, JMGB), pp. 396–399.
- MSR-2014-SainiSOL #debugging
- A dataset for maven artifacts and bug patterns found in them (VS, HS, JO, CVL), pp. 416–419.
- MSR-2014-WilliamsRMRK #modelling
- Models of OSS project meta-information: a dataset of three forges (JRW, DDR, NDM, JDR, DSK), pp. 408–411.
- MSR-2014-ZhangH #energy #mining
- A green miner’s dataset: mining the impact of software change on energy consumption (CZ, AH), pp. 400–403.
- HCI-AIMT-2014-RuffieuxLMK #gesture #overview #recognition
- A Survey of Datasets for Human Gesture Recognition (SR, DL, EM, OAK), pp. 337–348.
- HIMI-DE-2014-GombosK #query #recommendation
- SPARQL Query Writing with Recommendations Based on Datasets (GG, AK), pp. 310–319.
- ICEIS-v1-2014-TimoteoVF #analysis #case study #network #project management
- Evaluating Artificial Neural Networks and Traditional Approaches for Risk Analysis in Software Project Management — A Case Study with PERIL Dataset (CT, MV, SF), pp. 472–479.
- ECIR-2014-BelloginSVS #challenge #evaluation #web
- Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track (AB, TS, APdV, AS), pp. 430–436.
- ECIR-2014-CarageaWCWRCWG #big data
- CiteSeer x : A Scholarly Big Dataset (CC, JW, AMC, KW, JPFR, HHC, ZW, CLG), pp. 311–322.
- ICPR-2014-Calvo-ZaragozaO #music #recognition
- Recognition of Pen-Based Music Notation: The HOMUS Dataset (JCZ, JO), pp. 3038–3043.
- ICPR-2014-FletcherI #evaluation #quality
- Quality Evaluation of an Anonymized Dataset (SF, MZI), pp. 3594–3599.
- ICPR-2014-SandhanC #hybrid #pattern matching #pattern recognition #recognition
- Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition (TS, JYC), pp. 1449–1453.
- ICPR-2014-WangS #automation #multi #segmentation #using
- Automatic Multi-organ Segmentation in Non-enhanced CT Datasets Using Hierarchical Shape Priors (CW, ÖS), pp. 3327–3332.
- MLDM-2014-JavedA #classification #network #social #using
- Creation of Bi-lingual Social Network Dataset Using Classifiers (IJ, HA), pp. 523–533.
- MLDM-2014-WaiyamaiS #approach #classification
- A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets (KW, PS), pp. 31–42.
- HPDC-2014-SuAWBS #analysis #correlation #distributed #parallel
- Supporting correlation analysis on scientific datasets in parallel and distributed settings (YS, GA, JW, AB, HWS), pp. 191–202.
- ICDAR-2013-ShivramRSG #named
- IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset (AS, CR, SS, VG), pp. 13–17.
- JCDL-2013-GozaliKS #library
- Constructing an anonymous dataset from the personal digital photo libraries of mac app store users (JPG, MYK, HS), pp. 305–308.
- MSR-2013-BinkleyLPHV #identifier
- A dataset for evaluating identifier splitters (DB, DL, LLP, EH, KVS), pp. 401–404.
- MSR-2013-ButlerWYS #identifier #named
- INVocD: identifier name vocabulary dataset (SB, MW, YY, HS), pp. 405–408.
- MSR-2013-DitHPK #evaluation #maintenance
- A dataset from change history to support evaluation of software maintenance tasks (BD, AH, DP, HHK), pp. 131–134.
- MSR-2013-GoeminneCM #ecosystem #gnome
- A historical dataset for the gnome ecosystem (MG, MC, TM), pp. 225–228.
- MSR-2013-Gousios
- The GHTorent dataset and tool suite (GG), pp. 233–236.
- MSR-2013-HamasakiKYCFI #code review #overview #repository #what
- Who does what during a code review? datasets of OSS peer review repositories (KH, RGK, NY, AECC, KF, HI), pp. 49–52.
- MSR-2013-JanjicHSA #research #reuse #source code
- An unabridged source code dataset for research in software reuse (WJ, OH, MS, CA), pp. 339–342.
- MSR-2013-LamkanfiPD #debugging #eclipse #fault #mining
- The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information (AL, JP, SD), pp. 203–206.
- MSR-2013-MacLeanK #commit #network #social
- Apache commits: social network dataset (ACM, CDK), pp. 135–138.
- MSR-2013-RaemaekersDV #dependence #metric #repository
- The maven repository dataset of metrics, changes, and dependencies (SR, AvD, JV), pp. 221–224.
- MSR-2013-Squire
- Project roles in the apache software foundation: a dataset (MS), pp. 301–304.
- MSR-2013-Squire13a #twitter
- Apache-affiliated twitter screen names: a dataset (MS), pp. 305–308.
- MSR-2013-VasilescuSM #re-engineering
- A historical dataset of software engineering conferences (BV, AS, TM), pp. 373–376.
- MSR-2013-WagstromJS #graph #network #ruby
- A network of rails: a graph dataset of ruby on rails and associated projects (PW, CJ, AS), pp. 229–232.
- CSCW-2013-RostBCB #challenge #communication #representation #scalability #social #social media
- Representation and communication: challenges in interpreting large social media datasets (MR, LB, HC, BB), pp. 357–362.
- CIKM-2013-GilpinQD #clustering #performance #scalability
- Efficient hierarchical clustering of large high dimensional datasets (SG, BQ, ID), pp. 1371–1380.
- MLDM-2013-AllahSG #algorithm #array #mining #performance #scalability
- An Efficient and Scalable Algorithm for Mining Maximal — High Confidence Rules from Microarray Dataset (WZAA, YKES, FFMG), pp. 352–366.
- MLDM-2013-ParraL #clustering #using
- Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering (FLP, EL), pp. 130–143.
- RecSys-2013-Aiolli #performance #recommendation #scalability
- Efficient top-n recommendation for very large scale binary rated datasets (FA), pp. 273–280.
- SAC-2013-HapfelmeierSK #incremental #linear #performance
- Incremental linear model trees on massive datasets: keep it simple, keep it fast (AH, JS, SK), pp. 129–135.
- DATE-2013-StergiouJ #optimisation
- Optimizing BDDs for time-series dataset manipulation (SS, JJ), pp. 1018–1021.
- HPDC-2013-SuAWMWA #distributed #using
- Taming massive distributed datasets: data sampling using bitmap indices (YS, GA, JW, KM, JW, JPA), pp. 13–24.
- HPDC-2013-YinLBGN #order #performance #pipes and filters #using
- Efficient analytics on ordered datasets using MapReduce (JY, YL, MB, LG, AN), pp. 125–126.
- DRR-2012-WalkerLR #documentation #image
- A synthetic document image dataset for developing and evaluating historical document processing methods (DDW, WBL, EKR).
- TPDL-2012-BolandREM #identification
- Identifying References to Datasets in Publications (KB, DR, KE, BM), pp. 150–161.
- VLDB-2012-Shirani-MehrKS #evaluation #performance #query #reachability #scalability
- Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets (HSM, FBK, CS), pp. 848–859.
- CHI-2012-FisherPDs #incremental #performance #scalability #trust #visualisation
- Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster (DF, IOP, SMD, MMCS), pp. 1673–1682.
- ICPR-2012-BoomHHF #clustering #image #using
- Supporting ground-truth annotation of image datasets using clustering (BJB, PXH, JH, RBF), pp. 1542–1545.
- ICPR-2012-ChenWY #clustering #graph
- Centroid-based clustering for graph datasets (LC, SW, XY), pp. 2144–2147.
- ICPR-2012-FausserS #clustering #kernel #scalability
- Clustering large datasets with kernel methods (SF, FS), pp. 501–504.
- ICPR-2012-MogelmoseTM #comparative #detection #evaluation #learning
- Learning to detect traffic signs: Comparative evaluation of synthetic and real-world datasets (AM, MMT, TBM), pp. 3452–3455.
- ICPR-2012-NafchiK #image #representation
- Rectangular based binary image representation: Theory, applications, and dataset introduction (HZN, HRK), pp. 190–193.
- ICPR-2012-TakalaP #identification #named #network #people
- CMV100: A dataset for people tracking and re-identification in sparse camera networks (VT, MP), pp. 1387–1390.
- ICPR-2012-TanLZ
- The dataset system of Economic Dispute handwritten (DSEDH) based on stroke shape and structure features (JT, JHL, XXZ), pp. 661–664.
- ICPR-2012-Utasi #classification
- Weighted conditional mutual information based boosting for classification of imbalanced datasets (ÁU), pp. 2711–2714.
- KDD-2012-ShiA #mobile #recommendation
- GetJar mobile application recommendations with very sparse datasets (KS, KA), pp. 204–212.
- KDIR-2012-KharbatBO #algorithm #case study
- A New Compaction Algorithm for LCS Rules — Breast Cancer Dataset Case Study (FK, LB, MO), pp. 382–385.
- KDIR-2012-Vanetik #classification
- Classification of Datasets with Frequent Itemsets is Wild (NV), pp. 386–389.
- MLDM-2012-JoutsijokiJ #case study
- DAGSVM vs. DAGKNN: An Experimental Case Study with Benthic Macroinvertebrate Dataset (HJ, MJ), pp. 439–453.
- SIGIR-2012-HuO #classification #using
- Genre classification for million song dataset using confidence-based classifiers combination (YH, MO), pp. 1083–1084.
- SAC-2012-GomiI #3d #image #mobile #multi #named
- MINI: a 3D mobile image browser with multi-dimensional datasets (AG, TI), pp. 989–996.
- HPDC-2012-HefeedaGA #approximate #clustering #distributed #scalability
- Distributed approximate spectral clustering for large-scale datasets (MH, FG, WAA), pp. 223–234.
- ICDAR-2011-AlaeiNP #benchmark #documentation #metric #segmentation
- A Benchmark Kannada Handwritten Document Dataset and Its Segmentation (AA, PN, UP), pp. 141–145.
- ICDAR-2011-QuiniouMSVMPM #named
- HAMEX — A Handwritten and Audio Dataset of Mathematical Expressions (SQ, HM, SPS, CVG, EM, SP, SM), pp. 452–456.
- SIGMOD-2011-DuanKSU #benchmark #comparison #metric #rdf
- Apples and oranges: a comparison of RDF benchmarks and real RDF datasets (SD, AK, KS, OU), pp. 145–156.
- SIGMOD-2011-KashyapP #agile #development #interface #query #xml
- Rapid development of web-based query interfacesfor XML datasets with QURSED (AK, MP), pp. 1339–1342.
- VLDB-2012-BarskyKWH11 #correlation #mining #scalability #taxonomy
- Mining Flipping Correlations from Large Datasets with Taxonomies (MB, SK, TW, JH), pp. 370–381.
- OCSC-2011-AliprandiRMTM #rdf #semantics #web #wiki
- Extracting Events from Wikipedia as RDF Triples Linked to Widespread Semantic Web Datasets (CA, FR, AM, MT, SM), pp. 90–99.
- CIKM-2011-CachedaCFF #algorithm #analysis #nearest neighbour
- Improving k-nearest neighbors algorithms: practical application of dataset analysis (FC, VC, DF, VF), pp. 2253–2256.
- CIKM-2011-LiuYS #query
- Subject-oriented top-k hot region queries in spatial dataset (JL, GY, HS), pp. 2409–2412.
- CIKM-2011-SelvarajBSS #classification
- Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset (SKS, BB, SS, SKS), pp. 653–662.
- KDD-2011-CordeiroTTLKF #clustering #multi #pipes and filters #scalability
- Clustering very large multi-dimensional datasets with MapReduce (RLFC, CTJ, AJMT, JL, UK, CF), pp. 690–698.
- SIGIR-2011-LeeHWHS #graph #image #learning #multi #pipes and filters #scalability #using
- Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce (WYL, LCH, GLW, WHH, YFS), pp. 1121–1122.
- SIGMOD-2010-WangWLWWLTXL #detection #named
- MapDupReducer: detecting near duplicates over massive datasets (CW, JW, XL, WW, HW, HL, WT, JX, RL), pp. 1119–1122.
- VLDB-2010-LiD #probability #ranking
- Ranking Continuous Probabilistic Datasets (JL, AD), pp. 638–649.
- VLDB-2010-Matsudaira #3d #biology #scalability
- High-End Biological Imaging Generates Very Large 3D+ and Dynamic Datasets (PM), p. 3.
- VLDB-2010-MelnikGLRSTV #analysis #interactive #named
- Dremel: Interactive Analysis of Web-Scale Datasets (SM, AG, JJL, GR, SS, MT, TV), pp. 330–339.
- VLDB-2010-ParameswaranGR #concept #scalability #towards #web
- Towards The Web of Concepts: Extracting Concepts from Large Datasets (AGP, HGM, AR), pp. 566–577.
- MSR-2010-BachmannB #correlation #debugging #process #quality #re-engineering
- When process data quality affects the number of bugs: Correlations in software engineering datasets (AB, AB), pp. 62–71.
- WCRE-2010-NguyenAH #bias #case study #debugging
- A Case Study of Bias in Bug-Fix Datasets (THDN, BA, AEH), pp. 259–268.
- CIKM-2010-CormodeKW #algorithm #scalability #set
- Set cover algorithms for very large datasets (GC, HJK, AW), pp. 479–488.
- CIKM-2010-HarpaleYGHY #multi #named #performance #personalisation
- CiteData: a new multi-faceted dataset for evaluating personalized search performance (AH, YY, SG, DH, ZY), pp. 549–558.
- ICML-2010-SyedR #identification
- Unsupervised Risk Stratification in Clinical Datasets: Identifying Patients at Risk of Rare Outcomes (ZS, IR), pp. 1023–1030.
- ICML-2010-TanWT #feature model #learning
- Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets (MT, LW, IWT), pp. 1047–1054.
- ICPR-2010-AlvarezSVO
- Perceptual Color Texture Codebooks for Retrieving in Highly Diverse Texture Datasets (SÁ, AS, MV, XO), pp. 866–869.
- ICPR-2010-KimM #classification
- Dense Structure Inference for Object Classification in Aerial LIDAR Dataset (EK, GGM), pp. 3049–3052.
- ICPR-2010-PapaCF #classification #optimisation
- Optimizing Optimum-Path Forest Classification for Huge Datasets (JPP, FAMC, AXF), pp. 4162–4165.
- ICPR-2010-SodaI #composition #integration #learning
- Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration (PS, GI), pp. 3117–3120.
- ICPR-2010-StottingerZKH #evaluation
- FeEval A Dataset for Evaluation of Spatio-temporal Local Features (JS, SZ, RK, AH), pp. 499–502.
- RecSys-2010-PilaszyZT #feedback #matrix #performance
- Fast als-based matrix factorization for explicit and implicit feedback datasets (IP, DZ, DT), pp. 71–78.
- PLDI-2010-ChenHEFPTW #optimisation
- Evaluating iterative optimization across 1000 datasets (YC, YH, LE, GF, LP, OT, CW), pp. 448–459.
- SAC-2010-BaralisCC #mining #persistent #scalability
- A persistent HY-Tree to efficiently support itemset mining on large datasets (EB, TC, SC), pp. 1060–1064.
- SAC-2010-LuccheseOP #generative #mining
- A generative pattern model for mining binary datasets (CL, SO, RP), pp. 1109–1110.
- STOC-2010-BravermanO #independence
- Measuring independence of datasets (VB, RO), pp. 271–280.
- ICDAR-2009-AntonacopoulosBPP #analysis #documentation #evaluation #layout #performance
- A Realistic Dataset for Performance Evaluation of Document Layout Analysis (AA, DB, CP, SP), pp. 296–300.
- CHI-2009-LeeSRCT #named #roadmap
- FacetLens: exposing trends and relationships to support sensemaking within faceted datasets (BL, GS, GGR, MC, DST), pp. 1293–1302.
- CIKM-2009-BalachandranPK #clustering #configuration management #documentation
- Interpretable and reconfigurable clustering of document datasets by deriving word-based rules (VB, DP, DK), pp. 1773–1776.
- CIKM-2009-Muntes-MuleroN #privacy #scalability
- Privacy and anonymization for very large datasets (VMM, JN), pp. 2117–2118.
- CIKM-2009-StoyanovichA #clustering
- Rank-aware clustering of structured datasets (JS, SAY), pp. 1429–1432.
- KDD-2009-DundarHBRR #case study #detection #learning #using
- Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology (MD, EDH, AKB, JPR, BR), pp. 279–288.
- MLDM-2009-CelepcikayEO #using
- Regional Pattern Discovery in Geo-referenced Datasets Using PCA (OUC, CFE, CO), pp. 719–733.
- MLDM-2009-SegataB #performance #scalability
- Fast Local Support Vector Machines for Large Datasets (NS, EB), pp. 295–310.
- ESEC-FSE-2009-BirdBADBFD #bias #debugging
- Fair and balanced?: bias in bug-fix datasets (CB, AB, EA, JD, AB, VF, PTD), pp. 121–130.
- SAC-2009-OwensMR #mining
- Capturing truthiness: mining truth tables in binary datasets (CCOI, TMM, NR), pp. 1467–1474.
- ECDL-2008-BindingMT #semantics
- Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM (CB, KM, DT), pp. 280–290.
- SIGMOD-2008-PanZW #clustering #composition #matrix #named #performance #scalability
- CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition (FP, XZ, WW), pp. 173–184.
- EDM-2008-VenturaRH #education #evaluation #framework #metric
- Analyzing Rule Evaluation Measures with Educational Datasets: A Framework to Help the Teacher (SV, CR, CH), pp. 177–181.
- ICEIS-DISI-2008-MuldnerML #data access #policy #xml
- Succinct Access Control Policies for Published XML Datasets (TM, JKM, GL), pp. 380–385.
- CIKM-2008-ChoudharyMB #evolution #on the
- On quantifying changes in temporally evolving dataset (RC, SM, AB), pp. 1459–1460.
- CIKM-2008-LiuLNBMG #feature model #performance #preprocessor #realtime #scalability
- Real-time data pre-processing technique for efficient feature extraction in large scale datasets (YL, LVL, RSN, KB, PM, CLG), pp. 981–990.
- CIKM-2008-NguyenS #analysis #correlation #performance
- Fast correlation analysis on time series datasets (PN, NS), pp. 787–796.
- CIKM-2008-ShawXG #approximate
- Deriving non-redundant approximate association rules from hierarchical datasets (GS, YX, SG), pp. 1451–1452.
- ICML-2008-WolfeHK #distributed #scalability
- Fully distributed EM for very large datasets (JW, AH, DK), pp. 1184–1191.
- ICPR-2008-WatanabeK08a #scalability
- RANSAC-SVM for large-scale datasets (KW, TK), pp. 1–4.
- KDD-2008-DasSN #category theory #detection
- Anomaly pattern detection in categorical datasets (KD, JGS, DBN), pp. 169–176.
- FSE-2008-OsterweilCEPWBH #experience #process #using #workflow
- Experience in using a process language to define scientific workflow and generate dataset provenance (LJO, LAC, AME, RMP, AEW, ERB, JLH), pp. 319–329.
- SIGMOD-2007-XiaoT #named #privacy #towards
- M-invariance: towards privacy preserving re-publication of dynamic datasets (XX, YT), pp. 689–700.
- HIMI-IIE-2007-CollinsNMRBCPFHP #collaboration
- The Karst Collaborative Workspace for Analyzing and Annotating Scientific Datasets (LMC, DEN, MLBM, JVR, MAB, CRC, JEP, BFS, SKH, JCP), pp. 3–12.
- KDD-2007-DasS #category theory #detection
- Detecting anomalous records in categorical datasets (KD, JGS), pp. 220–229.
- PADL-2007-Costa #performance #prolog
- Prolog Performance on Larger Datasets (VSC), pp. 185–199.
- DRR-2006-ZhangA #towards
- Toward quantifying the amount of style in a dataset (XZ, SA).
- SIGMOD-2006-KiferG #injection
- Injecting utility into anonymized datasets (DK, JG), pp. 217–228.
- VLDB-2006-GemullaLH #evolution #maintenance
- A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets (RG, WL, PJH), pp. 595–606.
- VLDB-2006-JiTT #3d #mining
- Mining Frequent Closed Cubes in 3D Datasets (LJ, KLT, AKHT), pp. 811–822.
- ICPR-v1-2006-AnC
- Finding Rule Groups to Classify High Dimensional Gene Expression Datasets (JA, YPPC), pp. 1196–1199.
- ICPR-v3-2006-FelsbergG #multi #named #robust #scalability
- P-Channels: Robust Multivariate M-Estimation of Large Datasets (MF, GHG), pp. 262–267.
- ICPR-v4-2006-Jun #comparison #detection
- A Peer Dataset Comparison Outlier Detection Model Applied to Financial Surveillance (TJ), pp. 900–903.
- SAC-2006-FangG #bound
- Boundary surface extraction and rendering for volume datasets (SF, PG), pp. 1356–1360.
- SAC-2006-NemalhabibS #algorithm #category theory #clustering #named
- CLUC: a natural clustering algorithm for categorical datasets based on cohesion (AN, NS), pp. 637–638.
- ICDAR-2005-AgrawalBMV #named #online #representation #xml
- UPX: A New XML Representation for Annotated Datasets of Online Handwriting Data (MA, KB, SM, LV), pp. 1161–1165.
- ICEIS-v2-2005-DoP #mining #scalability #visualisation
- Mining Very Large Datasets with SVM and Visualization (TND, FP), pp. 127–141.
- KDD-2005-JinWPPA #graph
- Discovering frequent topological structures from graph datasets (RJ, CW, DP, SP, GA), pp. 606–611.
- KDD-2005-ZakiPAS #algorithm #category theory #clustering #effectiveness #mining #named
- CLICKS: an effective algorithm for mining subspace clusters in categorical datasets (MJZ, MP, IA, TS), pp. 736–742.
- MLDM-2005-FerrandizB05a #evaluation
- Supervised Evaluation of Dataset Partitions: Advantages and Practice (SF, MB), pp. 600–609.
- MLDM-2005-SiaL #clustering #scalability #using
- Clustering Large Dynamic Datasets Using Exemplar Points (WS, MML), pp. 163–173.
- SIGMOD-2004-CongXPTY #array #named
- FARMER: Finding Interesting Rule Groups in Microarray Datasets (GC, AKHT, XX, FP, JY), pp. 143–154.
- VLDB-2004-HoweM #algebra
- Algebraic Manipulation of Scientific Datasets (BH, DM), pp. 924–935.
- CIKM-2004-ChenL #clustering #named #scalability #visualisation
- ClusterMap: labeling clusters in large datasets via visualization (KC, LL), pp. 285–293.
- CIKM-2004-ChungJM #clustering #mining #using
- Mining gene expression datasets using density-based clustering (SC, JJ, DM), pp. 150–151.
- ICPR-v4-2004-WillisC #3d #multi #symmetry
- Alignment of Multiple Non-Overlapping Axially Symmetric 3D Datasets (ARW, DBC), pp. 96–99.
- KDD-2004-TruongLB #learning #random #using
- Learning a complex metabolomic dataset using random forests and support vector machines (YT, XL, CB), pp. 835–840.
- SIGIR-2004-DavidovGM #categorisation #generative
- Parameterized generation of labeled datasets for text categorization based on a hierarchical directory (DD, EG, SM), pp. 250–257.
- SAC-2004-AdamJA #detection
- Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets (NRA, VPJ, VA), pp. 576–583.
- SAC-2004-CarswellGN #multi #semantics #transaction
- Wireless spatio-semantic transactions on multimedia datasets (JDC, KG, MN), pp. 1201–1205.
- DRR-2003-HauserSSDST
- Correcting OCR text by association with historical datasets (SEH, JS, TFS, DDF, SS, GRT), pp. 84–93.
- VLDB-2003-LinLYZ #multi #scalability
- Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets (XL, QL, YY, XZ), pp. 814–825.
- ICEIS-v2-2003-DoP #algorithm #mining #scalability
- Mining Very Large Datasets with Support Vector Machine Algorithms (TND, FP), pp. 140–147.
- ICML-2003-LeskovecS #linear #programming
- Linear Programming Boosting for Uneven Datasets (JL, JST), pp. 456–463.
- ICML-2003-ZhuWC #scalability
- Eliminating Class Noise in Large Datasets (XZ, XW, QC), pp. 920–927.
- KDD-2003-El-HajjZ #interactive #matrix #mining #performance #scalability
- Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining (MEH, ORZ), pp. 109–118.
- KDD-2003-KoyuturkG #framework #named
- PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets (MK, AG), pp. 147–156.
- KDD-2003-PanCTYZ #biology #named
- Carpenter: finding closed patterns in long biological datasets (FP, GC, AKHT, JY, MJZ), pp. 637–642.
- KDD-2003-PeterCG #algorithm #clustering #scalability
- New unsupervised clustering algorithm for large datasets (WP, JC, CG), pp. 643–648.
- MLDM-2003-LeleuRBE #mining #named
- GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions (ML, CR, JFB, GE), pp. 293–306.
- SAC-2003-ChenL #clustering #visualisation
- Cluster Rendering of Skewed Datasets via Visualization (KC, LL), pp. 909–916.
- CIKM-2002-ZhaoK #algorithm #clustering #documentation #evaluation
- Evaluation of hierarchical clustering algorithms for document datasets (YZ, GK), pp. 515–524.
- KDD-2002-RidgewayM #analysis
- Bayesian analysis of massive datasets via particle filters (GR, DM), pp. 5–13.
- KDD-2002-TantrumMS #clustering #modelling #scalability
- Hierarchical model-based clustering of large datasets through fractionation and refractionation (JT, AM, WS), pp. 183–190.
- ICEIS-v1-2001-KotsisWFM #multi #novel #visualisation
- Novel Data Visualisation and Exploration in Multidimensional Datasets (NK, GRSW, JDF, DRM), pp. 170–175.
- ICML-2001-DomeniconiG #approach #approximate #classification #multi #nearest neighbour #performance #query #scalability
- An Efficient Approach for Approximating Multi-dimensional Range Queries and Nearest Neighbor Classification in Large Datasets (CD, DG), pp. 98–105.
- KDD-2001-BeygelzimerPM #category theory #performance #scalability #visualisation
- Fast ordering of large categorical datasets for better visualization (AB, CSP, SM), pp. 239–244.
- KDD-2000-BarbaraC #clustering #using
- Using the fractal dimension to cluster datasets (DB, PC), pp. 260–264.
- KDD-2000-Yang #3d #interactive #relational #scalability
- Interactive exploration of very large relational datasets through 3D dynamic projections (LY), pp. 236–243.
- KDD-2000-ZhangDR #constraints #scalability
- Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets (XZ, GD, KR), pp. 310–314.
- SIGMOD-1999-MankuRL #online #order #performance #random #scalability #statistics
- Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets (GSM, SR, BGL), pp. 251–262.
- KDD-1999-DaviesM #network
- Bayesian Networks for Lossless Dataset Compression (SD, AWM), pp. 387–391.
- VLDB-1998-GehrkeRG #framework #named #performance #scalability
- RainForest — A Framework for Fast Decision Tree Construction of Large Datasets (JG, RR, VG), pp. 416–427.
- VLDB-1998-KnorrN #algorithm #mining #scalability
- Algorithms for Mining Distance-Based Outliers in Large Datasets (EMK, RTN), pp. 392–403.
- VLDB-1998-ShuklaDN #multi
- Materialized View Selection for Multidimensional Datasets (AS, PD, JFN), pp. 488–499.
- KDD-1998-AlsabtiRS #classification #named #scalability
- CLOUDS: A Decision Tree Classifier for Large Datasets (KA, SR, VS), pp. 2–8.
- KDD-1998-OatesJ #modelling #scalability
- Large Datasets Lead to Overly Complex Models: An Explanation and a Solution (TO, DJ), pp. 294–298.
- SIGMOD-1997-KornJF #ad hoc #query #scalability #sequence
- Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences (FK, HVJ, CF), pp. 289–300.
- SIGMOD-1997-LivnyRBCDLMW #named #query #scalability #visualisation
- DEVise: Integrated Querying and Visualization of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 301–312.
- SIGMOD-1997-LivnyRBCDLMW97a #named #query #scalability #visual notation
- DEVise: Integrated Querying and Visual Exploration of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 517–520.
- KDD-1997-ZupanBBC #approach #composition #data mining #mining
- A Dataset Decomposition Approach to Data Mining and Machine Discovery (BZ, MB, IB, BC), pp. 299–302.
- SIGMOD-1995-FaloutsosL #algorithm #multi #named #performance #visualisation
- FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets (CF, KIL), pp. 163–174.
- KDD-1995-StolorzNMMSSYNCMF #data mining #mining #performance #scalability
- Fast Spatio-Temporal Data Mining of Large Geophysical Datasets (PES, HN, EM, RRM, ECS, JRS, JY, KWN, SYC, CRM, JDF), pp. 300–305.