BibSLEIGH
BibSLEIGH corpus
BibSLEIGH tags
BibSLEIGH bundles
BibSLEIGH people
EDIT!
CC-BY
Open Knowledge
XHTML 1.0 W3C Rec
CSS 2.1 W3C CanRec
email twitter
dataset
Google dataset

Tag #dataset

294 papers:

EDMEDM-2019-JensenHD #detection #modelling #student
Generalizability of Sensor-Free Affect Detection Models in a Longitudinal Dataset of Tens of Thousands of Students (EJ, SH, SKD).
ICSMEICSME-2019-LevinY #scalability #source code
Processing Large Datasets of Fined Grained Source Code Changes (SL, AY), pp. 382–385.
ICSMEICSME-2019-NewmanDAPKH19a #open data
An Open Dataset of Abbreviations and Expansions (CDN, MJD, RSA, AP, DK, EH0), p. 280.
MSRMSR-2019-AhluwaliaFP #fault #named #predict
Snoring: a noise in defect prediction datasets (AA, DF, MDP), pp. 63–67.
MSRMSR-2019-BiswasIHR #python
Boa meets python: a boa dataset of data science software in python language (SB, MJI, YH, HR), pp. 577–581.
MSRMSR-2019-JoshiC #agile #git #named
RapidRelease: a dataset of projects and issues on github with rapid releases (SDJ, SC), pp. 587–591.
MSRMSR-2019-PietriSZ #development #graph
The software heritage graph dataset: public software development under one roof (AP, DS, SZ), pp. 138–142.
MSRMSR-2019-PontaPSBD #open source
A manually-curated dataset of fixes to vulnerabilities of open-source software (SEP, HP, AS, MB, CD), pp. 383–387.
MSRMSR-2019-RaduN #debugging #non-functional
A dataset of non-functional bugs (AR, SN), pp. 399–403.
MSRMSR-2019-WangSL0 #android #metadata #named #reliability #towards
RmvDroid: towards a reliable Android malware dataset with app metadata (HW, JS, HL, YG0), pp. 404–408.
MSRMSR-2019-WickertREDM #encryption #parametricity
A dataset of parametric cryptographic misuses (AKW, MR, ME, AD, MM), pp. 96–100.
CIKMCIKM-2019-ChenMLZM #named #scalability #web
TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions (JC, JM, YL, MZ0, SM), pp. 2485–2488.
CIKMCIKM-2019-ChenWCKQ #generative #query #towards
Towards More Usable Dataset Search: From Query Characterization to Snippet Generation (JC, XW, GC0, EK, YQ), pp. 2445–2448.
CIKMCIKM-2019-Cohen-ShapiraRS #named #recommendation #representation #visual notation
AutoGRD: Model Recommendation Through Graphical Dataset Representation (NCS, LR, BS, GK, RV), pp. 821–830.
CIKMCIKM-2019-SunAJHS #flexibility #named
MithraLabel: Flexible Dataset Nutritional Labels for Responsible Data Science (CS, AA, HVJ, BH, JS), pp. 2893–2896.
ECIRECIR-p1-2019-LinjordetB #modelling
Impact of Training Dataset Size on Neural Answer Selection Models (TL, KB), pp. 828–835.
ECIRECIR-p2-2019-Dosso #keyword #rdf
Keyword Search on RDF Datasets (DD), pp. 332–336.
ICMLICML-2019-CornishVBDD #scalability
Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets (RC, PV, ABC, GD, AD), pp. 1351–1360.
ICMLICML-2019-GhadikolaeiGFS #big data #learning
Learning and Data Selection in Big Datasets (HSG, HGG, CF, MS), pp. 2191–2200.
ESEC-FSEESEC-FSE-2019-MiryeganehAH #approach #automation #integration #towards
An IR-based approach towards automated integration of geo-spatial datasets in map-based software systems (NM, MA, HH), pp. 946–954.
ICSE-2019-DmeiriTWBLDVR #mining #named
BugSwarm: mining and continuously growing a dataset of reproducible failures and fixes (DAT, ND, YW, AB, YCL, PTD, BV, CRG), pp. 339–349.
ICTSSICTSS-2019-NakajimaC #generative #machine learning #source code #testing
Generating Biased Dataset for Metamorphic Testing of Machine Learning Programs (SN0, TYC), pp. 56–64.
ICSMEICSME-2018-0008ZOPLB #analysis #re-engineering #sentiment
Two Datasets for Sentiment Analysis in Software Engineering (BL0, FZ, RO, MDP, ML, GB), p. 712.
MSRMSR-2018-GaoYJLYZ #concurrent #named #testing
Jbench: a dataset of data races for concurrency testing (JG, XY, YJ0, HL0, WY, XZ), pp. 6–9.
MSRMSR-2018-GeigerMPPNB #android #commit #graph
A graph-based dataset of commit history of real-world Android apps (FXG, IM, LP, FP, DDN, AB), pp. 30–33.
MSRMSR-2018-GkortzisMS #named #open source #security
VulinOSS: a dataset of security vulnerabilities in open-source systems (AG, DM, DS), pp. 18–21.
MSRMSR-2018-MarkovtsevL #git
Public git archive: a big code dataset for all (VM, WL), pp. 34–37.
MSRMSR-2018-MartinsAL #java #named
50K-C: a dataset of compilable, and compiled, Java projects (PM0, RA, CVL), pp. 1–5.
MSRMSR-2018-ProkschAN #developer #empirical #process
Enriched event streams: a general dataset for empirical studies on in-IDE activities of software developers (SP, SA, SN), pp. 62–65.
MSRMSR-2018-SahaLLYP #debugging #java #scalability
Bugs.jar: a large-scale, diverse dataset of real-world Java bugs (RKS, YL, WL, HY, MRP), pp. 10–13.
MSRMSR-2018-XuZ #kernel #linux #multi
A multi-level dataset of linux kernel patchwork (YX, MZ), pp. 54–57.
MSRMSR-2018-YuLYWW #git
A dataset of duplicate pull-requests in github (YY0, ZL, GY, TW0, HW), pp. 22–25.
SANERSANER-2018-SobreiraDDMM #debugging #fault
Dissection of a bug dataset: Anatomy of 395 patches from Defects4J (VS, TD, FM, MM, MdAM), pp. 130–140.
CoGCIG-2018-AungBDCKYW #learning #predict #scalability
Predicting Skill Learning in a Large, Longitudinal MOBA Dataset (MA, VB, AD, PIC, AVK, CY, ARW), pp. 1–7.
CoGCIG-2018-VariaTKK #3d #analysis #game studies
A Refined 3D Dataset for the Analysis of Player Actions in Exertion Games (CV, GT, KK, SDK), pp. 1–4.
CIKMCIKM-2018-AfsharPPSHS #named #scalability
COPA: Constrained PARAFAC2 for Sparse & Large Datasets (AA, IP, EEP, ES, JCH, JS), pp. 793–802.
CIKMCIKM-2018-HoangVN #benchmark #detection #metric #named #topic
W2E: A Worldwide-Event Benchmark Dataset for Topic Detection and Tracking (TAH, KDV, WN), pp. 1847–1850.
CIKMCIKM-2018-LevchenkoYAMKS #distributed #named #sketching
Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets (OL, DEY, RA, FM, BK, DES), pp. 1951–1954.
ECIRECIR-2018-DurRF #architecture #benchmark #challenge #lessons learnt #metric
Reproducing a Neural Question Answering Architecture Applied to the SQuAD Benchmark Dataset: Challenges and Lessons Learned (AD, AR, PF), pp. 102–113.
ICMLICML-2018-YoonJS18a #generative #modelling #multi #named #network #predict #using
RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks (JY, JJ, MvdS), pp. 5685–5693.
ICPRICPR-2018-ChowdhuryAT0R #recognition
MSU-AVIS dataset: Fusing Face and Voice Modalities for Biometric Recognition in Indoor Surveillance Videos (AC, YA, LT, XL0, AR), pp. 3567–3573.
ICPRICPR-2018-FerrariBB #identification
Extended YouTube Faces: a Dataset for Heterogeneous Open-Set Face Identification (CF, SB, ADB), pp. 3408–3413.
ICPRICPR-2018-LiangLJXL #benchmark #metric #multi #named #predict
SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction (LL, LL, LJ, DX, ML), pp. 1598–1603.
ICPRICPR-2018-MerchantSKM #image #using
Appearance-based data augmentation for image datasets using contrast preserving sampling (AKM, TQS, BK, RM), pp. 1235–1240.
ICPRICPR-2018-TranLPHKTNP #analysis #multi
A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality (THT, TLL, DTP, VNH, VMK, QTT, TSN, CP), pp. 1947–1952.
ICPRICPR-2018-TuggenerESPS #classification #detection #segmentation
DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects (LT, IE, JS, MP, TS), pp. 3704–3709.
KDDKDD-2018-ChenCL #open data
Rotation-blended CNNs on a New Open Dataset for Tropical Cyclone Image-to-intensity Regression (BC, BFC, HTL), pp. 90–99.
KDDKDD-2018-RijnH
Hyperparameter Importance Across Datasets (JNvR, FH), pp. 2367–2376.
JCDLJCDL-2017-DuretecR0 #benchmark #metric
A Text Extraction Software Benchmark Based on a Synthesized Dataset (KD, AR, CB0), pp. 109–118.
JCDLJCDL-2017-SinghNGBMG #behaviour #case study #reuse
Citation Sentence Reuse Behavior of Scientists: A Case Study on Massive Bibliographic Text Dataset of Computer Science (MS0, AN, DG, NAB, AM0, PG), pp. 277–280.
MSRMSR-2017-AivaloglouHMR #source code
A dataset of scratch programs: scraped, shaped and scored (EA, FH, JML, GR), pp. 511–514.
MSRMSR-2017-MadeyskiK #fault #idea #predict
Continuous defect prediction: the idea and a related dataset (LM, MK), pp. 515–518.
MSRMSR-2017-OrellanaLMD #difference #integration #on the #testing
On the differences between unit and integration testing in the travistorrent dataset (GO, GL, AM, SD), pp. 451–454.
MSRMSR-2017-RoblesHHCF #git #modelling #uml
An extensive dataset of UML models in GitHub (GR, THQ, RH, MRVC, MAF), pp. 519–522.
MSRMSR-2017-SadatBM
Rediscovery datasets: connecting duplicate reports (MS, ABB, AVM), pp. 527–530.
MSRMSR-2017-ZhuLRC #semantics #version control
A dataset for dynamic discovery of semantic changes in version controlled software histories (CZ, YL0, JR, MC), pp. 523–526.
AIIDEAIIDE-2017-LinGKS #named #research
STARDATA: A StarCraft AI Research Dataset (ZL, JG, VK, GS), pp. 50–56.
ECIRECIR-2017-BhattacharjeeA #algorithm #clustering #incremental #nearest neighbour
Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets (PB, AA), pp. 568–574.
ICMLICML-2017-ValeraG #automation #statistics
Automatic Discovery of the Statistical Types of Variables in a Dataset (IV, ZG), pp. 3521–3529.
ICMLICML-2017-ZhouZIJWS #multi #testing
When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, l₂-consistency and Neuroscience Applications (HHZ, YZ, VKI, SCJ, GW, VS), pp. 4170–4179.
MSRMSR-2016-CosentinoIC #git
Findings from GitHub: methods, datasets and limitations (VC, JLCI, JC), pp. 137–141.
MSRMSR-2016-ProkschANM #c# #syntax
A dataset of simplified syntax trees for C# (SP, SA, SN, MM), pp. 476–479.
MSRMSR-2016-YangKYI #code review #mining #overview #people #process #repository
Mining the modern code review repositories: a dataset of people, process and product (XY, RGK, NY, HI), pp. 460–463.
MSRMSR-2016-ZhuZM #issue tracking #multi
Multi-extract and multi-level dataset of mozilla issue tracking history (JZ, MZ, HM), pp. 472–475.
SANERSANER-2016-KadarHFG #assessment #maintenance #refactoring
A Code Refactoring Dataset and Its Assessment Regarding Software Maintainability (IK, PH, RF, TG), pp. 599–603.
CIKMCIKM-2016-BleifussBFRW0PN #approximate #dependence #functional #scalability
Approximate Discovery of Functional Dependencies for Large Datasets (TB, SB, JF, JR, GW, SK0, TP, FN), pp. 1803–1812.
CIKMCIKM-2016-CaoY #benchmark #metric #modelling #named #network #platform #social
ASNets: A Benchmark Dataset of Aligned Social Networks for Cross-Platform User Modeling (XC, YY0), pp. 1881–1884.
CIKMCIKM-2016-NguyenTTN #named #social #summary
SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization (MTN, CXT, DVT, MLN), pp. 2409–2412.
ECIRECIR-2016-BotevaGSR #information retrieval #learning #rank
A Full-Text Learning to Rank Dataset for Medical Information Retrieval (VB, DGG, AS, SR), pp. 716–722.
ICPRICPR-2016-MoazzenT #approximate #clustering
Sampling based approximate spectral clustering ensemble for partitioning datasets (YM, KT), pp. 1630–1635.
ICPRICPR-2016-MurguiaRA #adaptation #architecture #evaluation #modelling #network #parallel
Evaluation of the background modeling method Auto-Adaptive Parallel Neural Network Architecture in the SBMnet dataset (MICM, JARQ, GRA), pp. 137–142.
ICPRICPR-2016-OrtegoSM #estimation #multi #re-engineering
Rejection based multipath reconstruction for background estimation in SBMnet 2016 dataset (DO, JCS, JMM0), pp. 114–119.
ICPRICPR-2016-YuenMT #algorithm #evaluation #on the
On looking at faces in an automobile: Issues, algorithms and evaluation on naturalistic driving dataset (KY, SM, MMT), pp. 2777–2782.
KDDKDD-2016-GaoP #named #relational
Squish: Near-Optimal Compression for Archival of Relational Datasets (YG, AGP), pp. 1575–1584.
KDDKDD-2016-MaiAS #algorithm #clustering #named #performance #scalability
AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets (STM, IA, MS), pp. 1025–1034.
DRRDRR-2015-ChenSWLHI #analysis #documentation #layout
Ground truth model, tool, and dataset for layout analysis of historical documents (KC, MS, HW, ML, JH, RI), p. 940204.
HTHT-2015-RoutB #algorithm #ranking #twitter
A Human-annotated Dataset for Evaluating Tweet Ranking Algorithms (DPR, KB), pp. 95–99.
PODSPODS-2015-Cormode #scalability #summary
Compact Summaries over Large Datasets (GC), pp. 157–158.
TPDLTPDL-2015-LlewellynGAOT #topic #twitter
Extracting a Topic Specific Dataset from a Twitter Archive (CL, CG, BA, JO, RT), pp. 364–367.
VLDBVLDB-2015-BhattacherjeeCH #trade-off #version control
Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff (SB, AC, SH, AD, AGP), pp. 1346–1357.
VLDBVLDB-2015-HarbiAKM #query #rdf
Evaluating SPARQL Queries on Massive RDF Datasets (RH, IA, PK, NM), pp. 1848–1859.
EDMEDM-2015-VossSMS #approach #learning #matrix
A Transfer Learning Approach for Applying Matrix Factorization to Small ITS Datasets (LV, CS, CM, LST), pp. 372–375.
ICSMEICSME-2015-JansenH #industrial #smell #spreadsheet
Code smells in spreadsheet formulas revisited on an industrial dataset (BJ, FH), pp. 372–380.
MSRMSR-2015-AltingerSDW #embedded #fault #industrial #modelling #novel #predict
A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software (HA, SS, YD, FW), pp. 494–497.
MSRMSR-2015-GermanAH #git #linux #process
A Dataset of the Activity of the Git Super-repository of Linux in 2012 (DMG, BA, AEH), pp. 470–473.
MSRMSR-2015-HabayebMMBB #fault
The Firefox Temporal Defect Dataset (MH, AVM, SSM, LB, AB), pp. 498–501.
MSRMSR-2015-KrutzMMRPFS #android #open source
A Dataset of Open-Source Android Applications (DEK, MM, SAM, AR, JP, AF, JS), pp. 522–525.
MSRMSR-2015-MauczkaBSG #commit #developer
Dataset of Developer-Labeled Commit Messages (AM, FB, CS, TG), pp. 490–493.
MSRMSR-2015-OhiraKYYMLFHIM #classification #debugging
A Dataset of High Impact Bugs: Manually-Classified Issue Reports (MO, YK, YY, HY, YM, NL, KF, HH, AI, KiM), pp. 518–521.
MSRMSR-2015-PalombaNTBOPL #evaluation #named #open data #smell
Landfill: An Open Dataset of Code Smells with Public Evaluation (FP, DDN, MT, GB, RO, DP, ADL), pp. 482–485.
MSRMSR-2015-SawantB #api
A Dataset for API Usage (AAS, AB), pp. 506–509.
MSRMSR-2015-WermelingerY #architecture #evolution
An Architectural Evolution Dataset (MW, YY), pp. 502–505.
MSRMSR-2015-Zacchiroli #metadata #source code
The Debsources Dataset: Two Decades of Debian Source Code Metadata (SZ), pp. 466–469.
SCAMSCAM-2015-AivaloglouHH #scalability #spreadsheet
A grammar for spreadsheet formulas evaluated on two large datasets (EA, DH, FH), pp. 121–130.
CoGVS-Games-2015-ChalasFFSK #3d #generative
Generation of Variable Human Faces from 3D Scan Dataset (IC, ZF, KF, JS, BK), pp. 1–8.
CSCWCSCW-2015-QuattroneCM #bias
There’s No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets (GQ, LC, PDM), pp. 1021–1032.
ICEISICEIS-v2-2015-SarinhoLS #linked data #open data #question
Can You Find All the Data You Expect in a Linked Dataset? (WTS, BFL, DS), pp. 648–655.
CIKMCIKM-2015-LiXJL #adaptation #approach
Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach (HL, LX0, XJ, JL), pp. 1001–1010.
CIKMCIKM-2015-SinghPK0MG #case study #predict
The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset (MS0, VP, SK, TC0, AM0, PG), pp. 1271–1280.
ICMLICML-2015-BarbosaENW #distributed #power of
The Power of Randomization: Distributed Submodular Maximization on Massive Datasets (RdPB, AE, HLN, JW), pp. 1236–1244.
ICMLICML-2015-MaLF #analysis #canonical #correlation #linear #scalability
Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis (ZM, YL, DPF), pp. 169–178.
KDDKDD-2015-CaoWYR #online #scalability
Online Outlier Exploration Over Large Datasets (LC, MW, DY, EAR), pp. 89–98.
RecSysRecSys-2015-Ben-ShimonTFSRH #challenge
RecSys Challenge 2015 and the YOOCHOOSE Dataset (DBS, AT, MF, BS, LR, JH), pp. 357–358.
SIGIRSIGIR-2015-MorenoD #adaptation #metric #semistructured data
Adapted B-CUBED Metrics to Unbalanced Datasets (JGM, GD), pp. 911–914.
ASEASE-2015-NamK #fault #named #predict
CLAMI: Defect Prediction on Unlabeled Datasets (T) (JN, SK), pp. 452–463.
ICSEICSE-v2-2015-HermansM #analysis #email #spreadsheet
Enron’s Spreadsheets and Related Emails: A Dataset and Analysis (FH, ERMH), pp. 7–16.
SACSAC-2015-RochaRCOMVADGF #algorithm #classification #documentation #named #performance #using
G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
DRRDRR-2014-BrunoL #documentation #open data #recognition #research
The Lehigh Steel Collection: a new open dataset for document recognition research (BB, DPL), p. ?–9.
JCDLJCDL-2014-CastroSR #data transformation #lightweight #ontology #research #workflow
Creating lightweight ontologies for dataset description practical applications in a cross-domain research data management workflow (JAC, JRdS, CR), pp. 313–316.
JCDLJCDL-2014-HasanGFBM #framework #library
Data mapping framework in a digital library with computational epidemiology datasets (SMSH, SG, EAF, KRB, MVM), pp. 449–450.
JCDLJCDL-2014-LlewellynRBKSJ #information management
Building a dataset of sensitive information (CL, LR, RB, SK, MS, RvJ), pp. 493–494.
SIGMODSIGMOD-2014-SatishSPSPHSYD #framework #graph #navigation #using
Navigating the maze of graph analytics frameworks using massive graph datasets (NS, NS, MMAP, JS, JP, MAH, SS, ZY, PD), pp. 979–990.
VLDBVLDB-2015-MozafariSFJM14 #learning #scalability
Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning (BM, PS, MJF, MIJ, SM), pp. 125–136.
EDMEDM-2014-LiuMK #testing
Interpreting model discovery and testing generalization to a new dataset (RL0, EAM, KRK), pp. 107–113.
ICSMEICSME-2014-ThongtanunamYYKCFI #code review #named #overview #visualisation
ReDA: A Web-Based Visualization Tool for Analyzing Modern Code Review Dataset (PT, XY, NY, RGK, AECC, KF, HI), pp. 605–608.
MSRMSR-2014-GousiosZ #development #research
A dataset for pull-based development research (GG, AZ), pp. 368–371.
MSRMSR-2014-LazarRS14a #debugging #generative
Generating duplicate bug datasets (AL, SR, BS), pp. 392–395.
MSRMSR-2014-MurakamiHK
A dataset of clone references with gaps (HM, YH, SK), pp. 412–415.
MSRMSR-2014-PassosC #feature model #kernel #linux
A dataset of feature additions and feature removals from the Linux kernel (LTP, KC), pp. 376–379.
MSRMSR-2014-RoblesRSVG #challenge #overview
FLOSS 2013: a survey dataset about free software contributors: challenges for curating, sharing, and combining (GR, LAR, AS, BV, JMGB), pp. 396–399.
MSRMSR-2014-SainiSOL #debugging
A dataset for maven artifacts and bug patterns found in them (VS, HS, JO, CVL), pp. 416–419.
MSRMSR-2014-WilliamsRMRK #modelling
Models of OSS project meta-information: a dataset of three forges (JRW, DDR, NDM, JDR, DSK), pp. 408–411.
MSRMSR-2014-ZhangH #energy #mining
A green miner’s dataset: mining the impact of software change on energy consumption (CZ, AH), pp. 400–403.
HCIHCI-AIMT-2014-RuffieuxLMK #gesture #overview #recognition
A Survey of Datasets for Human Gesture Recognition (SR, DL, EM, OAK), pp. 337–348.
HCIHIMI-DE-2014-GombosK #query #recommendation
SPARQL Query Writing with Recommendations Based on Datasets (GG, AK), pp. 310–319.
ICEISICEIS-v1-2014-TimoteoVF #analysis #case study #network #project management
Evaluating Artificial Neural Networks and Traditional Approaches for Risk Analysis in Software Project Management — A Case Study with PERIL Dataset (CT, MV, SF), pp. 472–479.
ECIRECIR-2014-BelloginSVS #challenge #evaluation #web
Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track (AB, TS, APdV, AS), pp. 430–436.
ECIRECIR-2014-CarageaWCWRCWG #big data
CiteSeer x : A Scholarly Big Dataset (CC, JW, AMC, KW, JPFR, HHC, ZW, CLG), pp. 311–322.
ICPRICPR-2014-Calvo-ZaragozaO #music #recognition
Recognition of Pen-Based Music Notation: The HOMUS Dataset (JCZ, JO), pp. 3038–3043.
ICPRICPR-2014-FletcherI #evaluation #quality
Quality Evaluation of an Anonymized Dataset (SF, MZI), pp. 3594–3599.
ICPRICPR-2014-SandhanC #hybrid #pattern matching #pattern recognition #recognition
Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition (TS, JYC), pp. 1449–1453.
ICPRICPR-2014-WangS #automation #multi #segmentation #using
Automatic Multi-organ Segmentation in Non-enhanced CT Datasets Using Hierarchical Shape Priors (CW, ÖS), pp. 3327–3332.
MLDMMLDM-2014-JavedA #classification #network #social #using
Creation of Bi-lingual Social Network Dataset Using Classifiers (IJ, HA), pp. 523–533.
MLDMMLDM-2014-WaiyamaiS #approach #classification
A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets (KW, PS), pp. 31–42.
HPDCHPDC-2014-SuAWBS #analysis #correlation #distributed #parallel
Supporting correlation analysis on scientific datasets in parallel and distributed settings (YS, GA, JW, AB, HWS), pp. 191–202.
ICDARICDAR-2013-ShivramRSG #named
IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset (AS, CR, SS, VG), pp. 13–17.
JCDLJCDL-2013-GozaliKS #library
Constructing an anonymous dataset from the personal digital photo libraries of mac app store users (JPG, MYK, HS), pp. 305–308.
MSRMSR-2013-BinkleyLPHV #identifier
A dataset for evaluating identifier splitters (DB, DL, LLP, EH, KVS), pp. 401–404.
MSRMSR-2013-ButlerWYS #identifier #named
INVocD: identifier name vocabulary dataset (SB, MW, YY, HS), pp. 405–408.
MSRMSR-2013-DitHPK #evaluation #maintenance
A dataset from change history to support evaluation of software maintenance tasks (BD, AH, DP, HHK), pp. 131–134.
MSRMSR-2013-GoeminneCM #ecosystem #gnome
A historical dataset for the gnome ecosystem (MG, MC, TM), pp. 225–228.
MSRMSR-2013-Gousios
The GHTorent dataset and tool suite (GG), pp. 233–236.
MSRMSR-2013-HamasakiKYCFI #code review #overview #repository #what
Who does what during a code review? datasets of OSS peer review repositories (KH, RGK, NY, AECC, KF, HI), pp. 49–52.
MSRMSR-2013-JanjicHSA #research #reuse #source code
An unabridged source code dataset for research in software reuse (WJ, OH, MS, CA), pp. 339–342.
MSRMSR-2013-LamkanfiPD #debugging #eclipse #fault #mining
The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information (AL, JP, SD), pp. 203–206.
MSRMSR-2013-MacLeanK #commit #network #social
Apache commits: social network dataset (ACM, CDK), pp. 135–138.
MSRMSR-2013-RaemaekersDV #dependence #metric #repository
The maven repository dataset of metrics, changes, and dependencies (SR, AvD, JV), pp. 221–224.
MSRMSR-2013-Squire
Project roles in the apache software foundation: a dataset (MS), pp. 301–304.
MSRMSR-2013-Squire13a #twitter
Apache-affiliated twitter screen names: a dataset (MS), pp. 305–308.
MSRMSR-2013-VasilescuSM #re-engineering
A historical dataset of software engineering conferences (BV, AS, TM), pp. 373–376.
MSRMSR-2013-WagstromJS #graph #network #ruby
A network of rails: a graph dataset of ruby on rails and associated projects (PW, CJ, AS), pp. 229–232.
CSCWCSCW-2013-RostBCB #challenge #communication #representation #scalability #social #social media
Representation and communication: challenges in interpreting large social media datasets (MR, LB, HC, BB), pp. 357–362.
CIKMCIKM-2013-GilpinQD #clustering #performance #scalability
Efficient hierarchical clustering of large high dimensional datasets (SG, BQ, ID), pp. 1371–1380.
MLDMMLDM-2013-AllahSG #algorithm #array #mining #performance #scalability
An Efficient and Scalable Algorithm for Mining Maximal — High Confidence Rules from Microarray Dataset (WZAA, YKES, FFMG), pp. 352–366.
MLDMMLDM-2013-ParraL #clustering #using
Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering (FLP, EL), pp. 130–143.
RecSysRecSys-2013-Aiolli #performance #recommendation #scalability
Efficient top-n recommendation for very large scale binary rated datasets (FA), pp. 273–280.
SACSAC-2013-HapfelmeierSK #incremental #linear #performance
Incremental linear model trees on massive datasets: keep it simple, keep it fast (AH, JS, SK), pp. 129–135.
DATEDATE-2013-StergiouJ #optimisation
Optimizing BDDs for time-series dataset manipulation (SS, JJ), pp. 1018–1021.
HPDCHPDC-2013-SuAWMWA #distributed #using
Taming massive distributed datasets: data sampling using bitmap indices (YS, GA, JW, KM, JW, JPA), pp. 13–24.
HPDCHPDC-2013-YinLBGN #order #performance #pipes and filters #using
Efficient analytics on ordered datasets using MapReduce (JY, YL, MB, LG, AN), pp. 125–126.
DRRDRR-2012-WalkerLR #documentation #image
A synthetic document image dataset for developing and evaluating historical document processing methods (DDW, WBL, EKR).
TPDLTPDL-2012-BolandREM #identification
Identifying References to Datasets in Publications (KB, DR, KE, BM), pp. 150–161.
VLDBVLDB-2012-Shirani-MehrKS #evaluation #performance #query #reachability #scalability
Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets (HSM, FBK, CS), pp. 848–859.
CHICHI-2012-FisherPDs #incremental #performance #scalability #trust #visualisation
Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster (DF, IOP, SMD, MMCS), pp. 1673–1682.
ICPRICPR-2012-BoomHHF #clustering #image #using
Supporting ground-truth annotation of image datasets using clustering (BJB, PXH, JH, RBF), pp. 1542–1545.
ICPRICPR-2012-ChenWY #clustering #graph
Centroid-based clustering for graph datasets (LC, SW, XY), pp. 2144–2147.
ICPRICPR-2012-FausserS #clustering #kernel #scalability
Clustering large datasets with kernel methods (SF, FS), pp. 501–504.
ICPRICPR-2012-MogelmoseTM #comparative #detection #evaluation #learning
Learning to detect traffic signs: Comparative evaluation of synthetic and real-world datasets (AM, MMT, TBM), pp. 3452–3455.
ICPRICPR-2012-NafchiK #image #representation
Rectangular based binary image representation: Theory, applications, and dataset introduction (HZN, HRK), pp. 190–193.
ICPRICPR-2012-TakalaP #identification #named #network #people
CMV100: A dataset for people tracking and re-identification in sparse camera networks (VT, MP), pp. 1387–1390.
ICPRICPR-2012-TanLZ
The dataset system of Economic Dispute handwritten (DSEDH) based on stroke shape and structure features (JT, JHL, XXZ), pp. 661–664.
ICPRICPR-2012-Utasi #classification
Weighted conditional mutual information based boosting for classification of imbalanced datasets (ÁU), pp. 2711–2714.
KDDKDD-2012-ShiA #mobile #recommendation
GetJar mobile application recommendations with very sparse datasets (KS, KA), pp. 204–212.
KDIRKDIR-2012-KharbatBO #algorithm #case study
A New Compaction Algorithm for LCS Rules — Breast Cancer Dataset Case Study (FK, LB, MO), pp. 382–385.
KDIRKDIR-2012-Vanetik #classification
Classification of Datasets with Frequent Itemsets is Wild (NV), pp. 386–389.
MLDMMLDM-2012-JoutsijokiJ #case study
DAGSVM vs. DAGKNN: An Experimental Case Study with Benthic Macroinvertebrate Dataset (HJ, MJ), pp. 439–453.
SIGIRSIGIR-2012-HuO #classification #using
Genre classification for million song dataset using confidence-based classifiers combination (YH, MO), pp. 1083–1084.
SACSAC-2012-GomiI #3d #image #mobile #multi #named
MINI: a 3D mobile image browser with multi-dimensional datasets (AG, TI), pp. 989–996.
HPDCHPDC-2012-HefeedaGA #approximate #clustering #distributed #scalability
Distributed approximate spectral clustering for large-scale datasets (MH, FG, WAA), pp. 223–234.
ICDARICDAR-2011-AlaeiNP #benchmark #documentation #metric #segmentation
A Benchmark Kannada Handwritten Document Dataset and Its Segmentation (AA, PN, UP), pp. 141–145.
ICDARICDAR-2011-QuiniouMSVMPM #named
HAMEX — A Handwritten and Audio Dataset of Mathematical Expressions (SQ, HM, SPS, CVG, EM, SP, SM), pp. 452–456.
SIGMODSIGMOD-2011-DuanKSU #benchmark #comparison #metric #rdf
Apples and oranges: a comparison of RDF benchmarks and real RDF datasets (SD, AK, KS, OU), pp. 145–156.
SIGMODSIGMOD-2011-KashyapP #agile #development #interface #query #xml
Rapid development of web-based query interfacesfor XML datasets with QURSED (AK, MP), pp. 1339–1342.
VLDBVLDB-2012-BarskyKWH11 #correlation #mining #scalability #taxonomy
Mining Flipping Correlations from Large Datasets with Taxonomies (MB, SK, TW, JH), pp. 370–381.
HCIOCSC-2011-AliprandiRMTM #rdf #semantics #web #wiki
Extracting Events from Wikipedia as RDF Triples Linked to Widespread Semantic Web Datasets (CA, FR, AM, MT, SM), pp. 90–99.
CIKMCIKM-2011-CachedaCFF #algorithm #analysis #nearest neighbour
Improving k-nearest neighbors algorithms: practical application of dataset analysis (FC, VC, DF, VF), pp. 2253–2256.
CIKMCIKM-2011-LiuYS #query
Subject-oriented top-k hot region queries in spatial dataset (JL, GY, HS), pp. 2409–2412.
CIKMCIKM-2011-SelvarajBSS #classification
Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset (SKS, BB, SS, SKS), pp. 653–662.
KDDKDD-2011-CordeiroTTLKF #clustering #multi #pipes and filters #scalability
Clustering very large multi-dimensional datasets with MapReduce (RLFC, CTJ, AJMT, JL, UK, CF), pp. 690–698.
SIGIRSIGIR-2011-LeeHWHS #graph #image #learning #multi #pipes and filters #scalability #using
Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce (WYL, LCH, GLW, WHH, YFS), pp. 1121–1122.
SIGMODSIGMOD-2010-WangWLWWLTXL #detection #named
MapDupReducer: detecting near duplicates over massive datasets (CW, JW, XL, WW, HW, HL, WT, JX, RL), pp. 1119–1122.
VLDBVLDB-2010-LiD #probability #ranking
Ranking Continuous Probabilistic Datasets (JL, AD), pp. 638–649.
VLDBVLDB-2010-Matsudaira #3d #biology #scalability
High-End Biological Imaging Generates Very Large 3D+ and Dynamic Datasets (PM), p. 3.
VLDBVLDB-2010-MelnikGLRSTV #analysis #interactive #named
Dremel: Interactive Analysis of Web-Scale Datasets (SM, AG, JJL, GR, SS, MT, TV), pp. 330–339.
VLDBVLDB-2010-ParameswaranGR #concept #scalability #towards #web
Towards The Web of Concepts: Extracting Concepts from Large Datasets (AGP, HGM, AR), pp. 566–577.
MSRMSR-2010-BachmannB #correlation #debugging #process #quality #re-engineering
When process data quality affects the number of bugs: Correlations in software engineering datasets (AB, AB), pp. 62–71.
WCREWCRE-2010-NguyenAH #bias #case study #debugging
A Case Study of Bias in Bug-Fix Datasets (THDN, BA, AEH), pp. 259–268.
CIKMCIKM-2010-CormodeKW #algorithm #scalability #set
Set cover algorithms for very large datasets (GC, HJK, AW), pp. 479–488.
CIKMCIKM-2010-HarpaleYGHY #multi #named #performance #personalisation
CiteData: a new multi-faceted dataset for evaluating personalized search performance (AH, YY, SG, DH, ZY), pp. 549–558.
ICMLICML-2010-SyedR #identification
Unsupervised Risk Stratification in Clinical Datasets: Identifying Patients at Risk of Rare Outcomes (ZS, IR), pp. 1023–1030.
ICMLICML-2010-TanWT #feature model #learning
Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets (MT, LW, IWT), pp. 1047–1054.
ICPRICPR-2010-AlvarezSVO
Perceptual Color Texture Codebooks for Retrieving in Highly Diverse Texture Datasets (, AS, MV, XO), pp. 866–869.
ICPRICPR-2010-KimM #classification
Dense Structure Inference for Object Classification in Aerial LIDAR Dataset (EK, GGM), pp. 3049–3052.
ICPRICPR-2010-PapaCF #classification #optimisation
Optimizing Optimum-Path Forest Classification for Huge Datasets (JPP, FAMC, AXF), pp. 4162–4165.
ICPRICPR-2010-SodaI #composition #integration #learning
Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration (PS, GI), pp. 3117–3120.
ICPRICPR-2010-StottingerZKH #evaluation
FeEval A Dataset for Evaluation of Spatio-temporal Local Features (JS, SZ, RK, AH), pp. 499–502.
RecSysRecSys-2010-PilaszyZT #feedback #matrix #performance
Fast als-based matrix factorization for explicit and implicit feedback datasets (IP, DZ, DT), pp. 71–78.
PLDIPLDI-2010-ChenHEFPTW #optimisation
Evaluating iterative optimization across 1000 datasets (YC, YH, LE, GF, LP, OT, CW), pp. 448–459.
SACSAC-2010-BaralisCC #mining #persistent #scalability
A persistent HY-Tree to efficiently support itemset mining on large datasets (EB, TC, SC), pp. 1060–1064.
SACSAC-2010-LuccheseOP #generative #mining
A generative pattern model for mining binary datasets (CL, SO, RP), pp. 1109–1110.
STOCSTOC-2010-BravermanO #independence
Measuring independence of datasets (VB, RO), pp. 271–280.
ICDARICDAR-2009-AntonacopoulosBPP #analysis #documentation #evaluation #layout #performance
A Realistic Dataset for Performance Evaluation of Document Layout Analysis (AA, DB, CP, SP), pp. 296–300.
CHICHI-2009-LeeSRCT #named #roadmap
FacetLens: exposing trends and relationships to support sensemaking within faceted datasets (BL, GS, GGR, MC, DST), pp. 1293–1302.
CIKMCIKM-2009-BalachandranPK #clustering #configuration management #documentation
Interpretable and reconfigurable clustering of document datasets by deriving word-based rules (VB, DP, DK), pp. 1773–1776.
CIKMCIKM-2009-Muntes-MuleroN #privacy #scalability
Privacy and anonymization for very large datasets (VMM, JN), pp. 2117–2118.
CIKMCIKM-2009-StoyanovichA #clustering
Rank-aware clustering of structured datasets (JS, SAY), pp. 1429–1432.
KDDKDD-2009-DundarHBRR #case study #detection #learning #using
Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology (MD, EDH, AKB, JPR, BR), pp. 279–288.
MLDMMLDM-2009-CelepcikayEO #using
Regional Pattern Discovery in Geo-referenced Datasets Using PCA (OUC, CFE, CO), pp. 719–733.
MLDMMLDM-2009-SegataB #performance #scalability
Fast Local Support Vector Machines for Large Datasets (NS, EB), pp. 295–310.
ESEC-FSEESEC-FSE-2009-BirdBADBFD #bias #debugging
Fair and balanced?: bias in bug-fix datasets (CB, AB, EA, JD, AB, VF, PTD), pp. 121–130.
SACSAC-2009-OwensMR #mining
Capturing truthiness: mining truth tables in binary datasets (CCOI, TMM, NR), pp. 1467–1474.
TPDLECDL-2008-BindingMT #semantics
Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM (CB, KM, DT), pp. 280–290.
SIGMODSIGMOD-2008-PanZW #clustering #composition #matrix #named #performance #scalability
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition (FP, XZ, WW), pp. 173–184.
EDMEDM-2008-VenturaRH #education #evaluation #framework #metric
Analyzing Rule Evaluation Measures with Educational Datasets: A Framework to Help the Teacher (SV, CR, CH), pp. 177–181.
ICEISICEIS-DISI-2008-MuldnerML #data access #policy #xml
Succinct Access Control Policies for Published XML Datasets (TM, JKM, GL), pp. 380–385.
CIKMCIKM-2008-ChoudharyMB #evolution #on the
On quantifying changes in temporally evolving dataset (RC, SM, AB), pp. 1459–1460.
CIKMCIKM-2008-LiuLNBMG #feature model #performance #preprocessor #realtime #scalability
Real-time data pre-processing technique for efficient feature extraction in large scale datasets (YL, LVL, RSN, KB, PM, CLG), pp. 981–990.
CIKMCIKM-2008-NguyenS #analysis #correlation #performance
Fast correlation analysis on time series datasets (PN, NS), pp. 787–796.
CIKMCIKM-2008-ShawXG #approximate
Deriving non-redundant approximate association rules from hierarchical datasets (GS, YX, SG), pp. 1451–1452.
ICMLICML-2008-WolfeHK #distributed #scalability
Fully distributed EM for very large datasets (JW, AH, DK), pp. 1184–1191.
ICPRICPR-2008-WatanabeK08a #scalability
RANSAC-SVM for large-scale datasets (KW, TK), pp. 1–4.
KDDKDD-2008-DasSN #category theory #detection
Anomaly pattern detection in categorical datasets (KD, JGS, DBN), pp. 169–176.
FSEFSE-2008-OsterweilCEPWBH #experience #process #using #workflow
Experience in using a process language to define scientific workflow and generate dataset provenance (LJO, LAC, AME, RMP, AEW, ERB, JLH), pp. 319–329.
SIGMODSIGMOD-2007-XiaoT #named #privacy #towards
M-invariance: towards privacy preserving re-publication of dynamic datasets (XX, YT), pp. 689–700.
HCIHIMI-IIE-2007-CollinsNMRBCPFHP #collaboration
The Karst Collaborative Workspace for Analyzing and Annotating Scientific Datasets (LMC, DEN, MLBM, JVR, MAB, CRC, JEP, BFS, SKH, JCP), pp. 3–12.
KDDKDD-2007-DasS #category theory #detection
Detecting anomalous records in categorical datasets (KD, JGS), pp. 220–229.
PADLPADL-2007-Costa #performance #prolog
Prolog Performance on Larger Datasets (VSC), pp. 185–199.
DRRDRR-2006-ZhangA #towards
Toward quantifying the amount of style in a dataset (XZ, SA).
SIGMODSIGMOD-2006-KiferG #injection
Injecting utility into anonymized datasets (DK, JG), pp. 217–228.
VLDBVLDB-2006-GemullaLH #evolution #maintenance
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets (RG, WL, PJH), pp. 595–606.
VLDBVLDB-2006-JiTT #3d #mining
Mining Frequent Closed Cubes in 3D Datasets (LJ, KLT, AKHT), pp. 811–822.
ICPRICPR-v1-2006-AnC
Finding Rule Groups to Classify High Dimensional Gene Expression Datasets (JA, YPPC), pp. 1196–1199.
ICPRICPR-v3-2006-FelsbergG #multi #named #robust #scalability
P-Channels: Robust Multivariate M-Estimation of Large Datasets (MF, GHG), pp. 262–267.
ICPRICPR-v4-2006-Jun #comparison #detection
A Peer Dataset Comparison Outlier Detection Model Applied to Financial Surveillance (TJ), pp. 900–903.
SACSAC-2006-FangG #bound
Boundary surface extraction and rendering for volume datasets (SF, PG), pp. 1356–1360.
SACSAC-2006-NemalhabibS #algorithm #category theory #clustering #named
CLUC: a natural clustering algorithm for categorical datasets based on cohesion (AN, NS), pp. 637–638.
ICDARICDAR-2005-AgrawalBMV #named #online #representation #xml
UPX: A New XML Representation for Annotated Datasets of Online Handwriting Data (MA, KB, SM, LV), pp. 1161–1165.
ICEISICEIS-v2-2005-DoP #mining #scalability #visualisation
Mining Very Large Datasets with SVM and Visualization (TND, FP), pp. 127–141.
KDDKDD-2005-JinWPPA #graph
Discovering frequent topological structures from graph datasets (RJ, CW, DP, SP, GA), pp. 606–611.
KDDKDD-2005-ZakiPAS #algorithm #category theory #clustering #effectiveness #mining #named
CLICKS: an effective algorithm for mining subspace clusters in categorical datasets (MJZ, MP, IA, TS), pp. 736–742.
MLDMMLDM-2005-FerrandizB05a #evaluation
Supervised Evaluation of Dataset Partitions: Advantages and Practice (SF, MB), pp. 600–609.
MLDMMLDM-2005-SiaL #clustering #scalability #using
Clustering Large Dynamic Datasets Using Exemplar Points (WS, MML), pp. 163–173.
SIGMODSIGMOD-2004-CongXPTY #array #named
FARMER: Finding Interesting Rule Groups in Microarray Datasets (GC, AKHT, XX, FP, JY), pp. 143–154.
VLDBVLDB-2004-HoweM #algebra
Algebraic Manipulation of Scientific Datasets (BH, DM), pp. 924–935.
CIKMCIKM-2004-ChenL #clustering #named #scalability #visualisation
ClusterMap: labeling clusters in large datasets via visualization (KC, LL), pp. 285–293.
CIKMCIKM-2004-ChungJM #clustering #mining #using
Mining gene expression datasets using density-based clustering (SC, JJ, DM), pp. 150–151.
ICPRICPR-v4-2004-WillisC #3d #multi #symmetry
Alignment of Multiple Non-Overlapping Axially Symmetric 3D Datasets (ARW, DBC), pp. 96–99.
KDDKDD-2004-TruongLB #learning #random #using
Learning a complex metabolomic dataset using random forests and support vector machines (YT, XL, CB), pp. 835–840.
SIGIRSIGIR-2004-DavidovGM #categorisation #generative
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory (DD, EG, SM), pp. 250–257.
SACSAC-2004-AdamJA #detection
Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets (NRA, VPJ, VA), pp. 576–583.
SACSAC-2004-CarswellGN #multi #semantics #transaction
Wireless spatio-semantic transactions on multimedia datasets (JDC, KG, MN), pp. 1201–1205.
DRRDRR-2003-HauserSSDST
Correcting OCR text by association with historical datasets (SEH, JS, TFS, DDF, SS, GRT), pp. 84–93.
VLDBVLDB-2003-LinLYZ #multi #scalability
Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets (XL, QL, YY, XZ), pp. 814–825.
ICEISICEIS-v2-2003-DoP #algorithm #mining #scalability
Mining Very Large Datasets with Support Vector Machine Algorithms (TND, FP), pp. 140–147.
ICMLICML-2003-LeskovecS #linear #programming
Linear Programming Boosting for Uneven Datasets (JL, JST), pp. 456–463.
ICMLICML-2003-ZhuWC #scalability
Eliminating Class Noise in Large Datasets (XZ, XW, QC), pp. 920–927.
KDDKDD-2003-El-HajjZ #interactive #matrix #mining #performance #scalability
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining (MEH, ORZ), pp. 109–118.
KDDKDD-2003-KoyuturkG #framework #named
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets (MK, AG), pp. 147–156.
KDDKDD-2003-PanCTYZ #biology #named
Carpenter: finding closed patterns in long biological datasets (FP, GC, AKHT, JY, MJZ), pp. 637–642.
KDDKDD-2003-PeterCG #algorithm #clustering #scalability
New unsupervised clustering algorithm for large datasets (WP, JC, CG), pp. 643–648.
MLDMMLDM-2003-LeleuRBE #mining #named
GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions (ML, CR, JFB, GE), pp. 293–306.
SACSAC-2003-ChenL #clustering #visualisation
Cluster Rendering of Skewed Datasets via Visualization (KC, LL), pp. 909–916.
CIKMCIKM-2002-ZhaoK #algorithm #clustering #documentation #evaluation
Evaluation of hierarchical clustering algorithms for document datasets (YZ, GK), pp. 515–524.
KDDKDD-2002-RidgewayM #analysis
Bayesian analysis of massive datasets via particle filters (GR, DM), pp. 5–13.
KDDKDD-2002-TantrumMS #clustering #modelling #scalability
Hierarchical model-based clustering of large datasets through fractionation and refractionation (JT, AM, WS), pp. 183–190.
ICEISICEIS-v1-2001-KotsisWFM #multi #novel #visualisation
Novel Data Visualisation and Exploration in Multidimensional Datasets (NK, GRSW, JDF, DRM), pp. 170–175.
ICMLICML-2001-DomeniconiG #approach #approximate #classification #multi #nearest neighbour #performance #query #scalability
An Efficient Approach for Approximating Multi-dimensional Range Queries and Nearest Neighbor Classification in Large Datasets (CD, DG), pp. 98–105.
KDDKDD-2001-BeygelzimerPM #category theory #performance #scalability #visualisation
Fast ordering of large categorical datasets for better visualization (AB, CSP, SM), pp. 239–244.
KDDKDD-2000-BarbaraC #clustering #using
Using the fractal dimension to cluster datasets (DB, PC), pp. 260–264.
KDDKDD-2000-Yang #3d #interactive #relational #scalability
Interactive exploration of very large relational datasets through 3D dynamic projections (LY), pp. 236–243.
KDDKDD-2000-ZhangDR #constraints #scalability
Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets (XZ, GD, KR), pp. 310–314.
SIGMODSIGMOD-1999-MankuRL #online #order #performance #random #scalability #statistics
Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets (GSM, SR, BGL), pp. 251–262.
KDDKDD-1999-DaviesM #network
Bayesian Networks for Lossless Dataset Compression (SD, AWM), pp. 387–391.
VLDBVLDB-1998-GehrkeRG #framework #named #performance #scalability
RainForest — A Framework for Fast Decision Tree Construction of Large Datasets (JG, RR, VG), pp. 416–427.
VLDBVLDB-1998-KnorrN #algorithm #mining #scalability
Algorithms for Mining Distance-Based Outliers in Large Datasets (EMK, RTN), pp. 392–403.
VLDBVLDB-1998-ShuklaDN #multi
Materialized View Selection for Multidimensional Datasets (AS, PD, JFN), pp. 488–499.
KDDKDD-1998-AlsabtiRS #classification #named #scalability
CLOUDS: A Decision Tree Classifier for Large Datasets (KA, SR, VS), pp. 2–8.
KDDKDD-1998-OatesJ #modelling #scalability
Large Datasets Lead to Overly Complex Models: An Explanation and a Solution (TO, DJ), pp. 294–298.
SIGMODSIGMOD-1997-KornJF #ad hoc #query #scalability #sequence
Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences (FK, HVJ, CF), pp. 289–300.
SIGMODSIGMOD-1997-LivnyRBCDLMW #named #query #scalability #visualisation
DEVise: Integrated Querying and Visualization of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 301–312.
SIGMODSIGMOD-1997-LivnyRBCDLMW97a #named #query #scalability #visual notation
DEVise: Integrated Querying and Visual Exploration of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 517–520.
KDDKDD-1997-ZupanBBC #approach #composition #data mining #mining
A Dataset Decomposition Approach to Data Mining and Machine Discovery (BZ, MB, IB, BC), pp. 299–302.
SIGMODSIGMOD-1995-FaloutsosL #algorithm #multi #named #performance #visualisation
FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets (CF, KIL), pp. 163–174.
KDDKDD-1995-StolorzNMMSSYNCMF #data mining #mining #performance #scalability
Fast Spatio-Temporal Data Mining of Large Geophysical Datasets (PES, HN, EM, RRM, ECS, JRS, JY, KWN, SYC, CRM, JDF), pp. 300–305.

Bibliography of Software Language Engineering in Generated Hypertext (BibSLEIGH) is created and maintained by Dr. Vadim Zaytsev.
Hosted as a part of SLEBOK on GitHub.