BibSLEIGH
BibSLEIGH corpus
BibSLEIGH tags
BibSLEIGH bundles
BibSLEIGH people
CC-BY
Open Knowledge
XHTML 1.0 W3C Rec
CSS 2.1 W3C CanRec
email twitter
Used together with:
larg (48)
base (22)
cluster (21)
mine (19)
evalu (18)

Stem dataset$ (all stems)

205 papers:

DRRDRR-2015-ChenSWLHI #analysis #dataset #documentation #layout
Ground truth model, tool, and dataset for layout analysis of historical documents (KC, MS, HW, ML, JH, RI), p. 940204.
HTHT-2015-RoutB #algorithm #dataset #ranking #twitter
A Human-annotated Dataset for Evaluating Tweet Ranking Algorithms (DPR, KB), pp. 95–99.
PODSPODS-2015-Cormode #dataset #scalability #summary
Compact Summaries over Large Datasets (GC), pp. 157–158.
VLDBVLDB-2015-BhattacherjeeCH #dataset #trade-off #version control
Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff (SB, AC, SH, AD, AGP), pp. 1346–1357.
VLDBVLDB-2015-HarbiAKM #dataset #query #rdf
Evaluating SPARQL Queries on Massive RDF Datasets (RH, IA, PK, NM), pp. 1848–1859.
ICSMEICSME-2015-JansenH #dataset #industrial #smell #spreadsheet
Code smells in spreadsheet formulas revisited on an industrial dataset (BJ, FH), pp. 372–380.
MSRMSR-2015-AltingerSDW #dataset #embedded #fault #industrial #modelling #novel #predict
A Novel Industry Grade Dataset for Fault Prediction Based on Model-Driven Developed Automotive Embedded Software (HA, SS, YD, FW), pp. 494–497.
MSRMSR-2015-GermanAH #dataset #git #linux #process
A Dataset of the Activity of the Git Super-repository of Linux in 2012 (DMG, BA, AEH), pp. 470–473.
MSRMSR-2015-HabayebMMBB #dataset #fault
The Firefox Temporal Defect Dataset (MH, AVM, SSM, LB, AB), pp. 498–501.
MSRMSR-2015-KrutzMMRPFS #android #dataset #open source
A Dataset of Open-Source Android Applications (DEK, MM, SAM, AR, JP, AF, JS), pp. 522–525.
MSRMSR-2015-MauczkaBSG #commit #dataset #developer
Dataset of Developer-Labeled Commit Messages (AM, FB, CS, TG), pp. 490–493.
MSRMSR-2015-OhiraKYYMLFHIM #classification #dataset #debugging
A Dataset of High Impact Bugs: Manually-Classified Issue Reports (MO, YK, YY, HY, YM, NL, KF, HH, AI, KiM), pp. 518–521.
MSRMSR-2015-PalombaNTBOPL #dataset #evaluation #named #open data #smell
Landfill: An Open Dataset of Code Smells with Public Evaluation (FP, DDN, MT, GB, RO, DP, ADL), pp. 482–485.
MSRMSR-2015-SawantB #api #dataset
A Dataset for API Usage (AAS, AB), pp. 506–509.
MSRMSR-2015-WermelingerY #architecture #dataset #evolution
An Architectural Evolution Dataset (MW, YY), pp. 502–505.
MSRMSR-2015-Zacchiroli #dataset #metadata #source code
The Debsources Dataset: Two Decades of Debian Source Code Metadata (SZ), pp. 466–469.
SCAMSCAM-2015-AivaloglouHH #dataset #scalability #spreadsheet
A grammar for spreadsheet formulas evaluated on two large datasets (EA, DH, FH), pp. 121–130.
CSCWCSCW-2015-QuattroneCM #bias #dataset
There’s No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets (GQ, LC, PDM), pp. 1021–1032.
ICEISICEIS-v2-2015-SarinhoLS #dataset #linked data #open data #question
Can You Find All the Data You Expect in a Linked Dataset? (WTS, BFL, DS), pp. 648–655.
ICMLICML-2015-BarbosaENW #dataset #distributed #power of
The Power of Randomization: Distributed Submodular Maximization on Massive Datasets (RdPB, AE, HLN, JW), pp. 1236–1244.
ICMLICML-2015-MaLF #analysis #canonical #correlation #dataset #linear #scalability
Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis (ZM, YL, DPF), pp. 169–178.
KDDKDD-2015-CaoWYR #dataset #online #scalability
Online Outlier Exploration Over Large Datasets (LC, MW, DY, EAR), pp. 89–98.
RecSysRecSys-2015-Ben-ShimonTFSRH #challenge #dataset
RecSys Challenge 2015 and the YOOCHOOSE Dataset (DBS, AT, MF, BS, LR, JH), pp. 357–358.
SIGIRSIGIR-2015-MorenoD #adaptation #dataset #metric #semistructured data
Adapted B-CUBED Metrics to Unbalanced Datasets (JGM, GD), pp. 911–914.
SACSAC-2015-RochaRCOMVADGF #algorithm #classification #dataset #documentation #named #performance #using
G-KNN: an efficient document classification algorithm for sparse datasets on GPUs using KNN (LCdR, GSR, RC, RSO, DM, FV, GA, SD, MAG, RF), pp. 1335–1338.
ICSEICSE-v2-2015-HermansM #analysis #dataset #email #spreadsheet
Enron’s Spreadsheets and Related Emails: A Dataset and Analysis (FH, ERMH), pp. 7–16.
DRRDRR-2014-BrunoL #dataset #documentation #open data #recognition #research
The Lehigh Steel Collection: a new open dataset for document recognition research (BB, DPL), p. ?–9.
SIGMODSIGMOD-2014-SatishSPSPHSYD #dataset #framework #graph #navigation #using
Navigating the maze of graph analytics frameworks using massive graph datasets (NS, NS, MMAP, JS, JP, MAH, SS, ZY, PD), pp. 979–990.
VLDBVLDB-2015-MozafariSFJM14 #dataset #learning #scalability
Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning (BM, PS, MJF, MIJ, SM), pp. 125–136.
ICSMEICSME-2014-ThongtanunamYYKCFI #bibliography #code review #dataset #named #visualisation
ReDA: A Web-Based Visualization Tool for Analyzing Modern Code Review Dataset (PT, XY, NY, RGK, AECC, KF, HI), pp. 605–608.
MSRMSR-2014-GousiosZ #dataset #development #research
A dataset for pull-based development research (GG, AZ), pp. 368–371.
MSRMSR-2014-LazarRS14a #dataset #debugging #generative
Generating duplicate bug datasets (AL, SR, BS), pp. 392–395.
MSRMSR-2014-MurakamiHK #dataset
A dataset of clone references with gaps (HM, YH, SK), pp. 412–415.
MSRMSR-2014-PassosC #dataset #feature model #kernel #linux
A dataset of feature additions and feature removals from the Linux kernel (LTP, KC), pp. 376–379.
MSRMSR-2014-RoblesRSVG #bibliography #challenge #dataset
FLOSS 2013: a survey dataset about free software contributors: challenges for curating, sharing, and combining (GR, LAR, AS, BV, JMGB), pp. 396–399.
MSRMSR-2014-SainiSOL #dataset #debugging
A dataset for maven artifacts and bug patterns found in them (VS, HS, JO, CVL), pp. 416–419.
MSRMSR-2014-WilliamsRMRK #dataset #modelling
Models of OSS project meta-information: a dataset of three forges (JRW, DDR, NDM, JDR, DSK), pp. 408–411.
MSRMSR-2014-ZhangH #dataset #energy #mining
A green miner’s dataset: mining the impact of software change on energy consumption (CZ, AH), pp. 400–403.
HCIHCI-AIMT-2014-RuffieuxLMK #bibliography #dataset #gesture #recognition
A Survey of Datasets for Human Gesture Recognition (SR, DL, EM, OAK), pp. 337–348.
HCIHIMI-DE-2014-GombosK #dataset #query #recommendation
SPARQL Query Writing with Recommendations Based on Datasets (GG, AK), pp. 310–319.
ICEISICEIS-v1-2014-TimoteoVF #analysis #case study #dataset #network #project management
Evaluating Artificial Neural Networks and Traditional Approaches for Risk Analysis in Software Project Management — A Case Study with PERIL Dataset (CT, MV, SF), pp. 472–479.
ECIRECIR-2014-BelloginSVS #challenge #dataset #evaluation #web
Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track (AB, TS, APdV, AS), pp. 430–436.
ECIRECIR-2014-CarageaWCWRCWG #big data #dataset
CiteSeer x : A Scholarly Big Dataset (CC, JW, AMC, KW, JPFR, HHC, ZW, CLG), pp. 311–322.
ICPRICPR-2014-Calvo-ZaragozaO #dataset #music #recognition
Recognition of Pen-Based Music Notation: The HOMUS Dataset (JCZ, JO), pp. 3038–3043.
ICPRICPR-2014-FletcherI #dataset #evaluation #quality
Quality Evaluation of an Anonymized Dataset (SF, MZI), pp. 3594–3599.
ICPRICPR-2014-SandhanC #dataset #hybrid #pattern matching #pattern recognition #recognition
Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition (TS, JYC), pp. 1449–1453.
ICPRICPR-2014-WangS #automation #dataset #multi #segmentation #using
Automatic Multi-organ Segmentation in Non-enhanced CT Datasets Using Hierarchical Shape Priors (CW, ÖS), pp. 3327–3332.
MLDMMLDM-2014-JavedA #classification #dataset #network #social #using
Creation of Bi-lingual Social Network Dataset Using Classifiers (IJ, HA), pp. 523–533.
MLDMMLDM-2014-WaiyamaiS #approach #classification #dataset
A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets (KW, PS), pp. 31–42.
HPDCHPDC-2014-SuAWBS #analysis #correlation #dataset #distributed #parallel
Supporting correlation analysis on scientific datasets in parallel and distributed settings (YS, GA, JW, AB, HWS), pp. 191–202.
DATEDATE-2013-StergiouJ #dataset #optimisation
Optimizing BDDs for time-series dataset manipulation (SS, JJ), pp. 1018–1021.
ICDARICDAR-2013-ShivramRSG #dataset #named
IBM_UB_1: A Dual Mode Unconstrained English Handwriting Dataset (AS, CR, SS, VG), pp. 13–17.
MSRMSR-2013-BinkleyLPHV #dataset #identifier
A dataset for evaluating identifier splitters (DB, DL, LLP, EH, KVS), pp. 401–404.
MSRMSR-2013-ButlerWYS #dataset #identifier #named
INVocD: identifier name vocabulary dataset (SB, MW, YY, HS), pp. 405–408.
MSRMSR-2013-DitHPK #dataset #evaluation #maintenance
A dataset from change history to support evaluation of software maintenance tasks (BD, AH, DP, HHK), pp. 131–134.
MSRMSR-2013-GoeminneCM #dataset #ecosystem #gnome
A historical dataset for the gnome ecosystem (MG, MC, TM), pp. 225–228.
MSRMSR-2013-Gousios #dataset
The GHTorent dataset and tool suite (GG), pp. 233–236.
MSRMSR-2013-HamasakiKYCFI #bibliography #code review #dataset #repository #what
Who does what during a code review? datasets of OSS peer review repositories (KH, RGK, NY, AECC, KF, HI), pp. 49–52.
MSRMSR-2013-JanjicHSA #dataset #research #reuse #source code
An unabridged source code dataset for research in software reuse (WJ, OH, MS, CA), pp. 339–342.
MSRMSR-2013-LamkanfiPD #dataset #debugging #eclipse #fault #mining
The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information (AL, JP, SD), pp. 203–206.
MSRMSR-2013-MacLeanK #commit #dataset #network #social
Apache commits: social network dataset (ACM, CDK), pp. 135–138.
MSRMSR-2013-RaemaekersDV #dataset #dependence #metric #repository
The maven repository dataset of metrics, changes, and dependencies (SR, AvD, JV), pp. 221–224.
MSRMSR-2013-Squire #dataset
Project roles in the apache software foundation: a dataset (MS), pp. 301–304.
MSRMSR-2013-Squire13a #dataset #twitter
Apache-affiliated twitter screen names: a dataset (MS), pp. 305–308.
MSRMSR-2013-VasilescuSM #dataset #re-engineering
A historical dataset of software engineering conferences (BV, AS, TM), pp. 373–376.
MSRMSR-2013-WagstromJS #dataset #graph #network #ruby
A network of rails: a graph dataset of ruby on rails and associated projects (PW, CJ, AS), pp. 229–232.
CSCWCSCW-2013-RostBCB #challenge #communication #dataset #representation #scalability #social #social media
Representation and communication: challenges in interpreting large social media datasets (MR, LB, HC, BB), pp. 357–362.
CIKMCIKM-2013-GilpinQD #clustering #dataset #performance #scalability
Efficient hierarchical clustering of large high dimensional datasets (SG, BQ, ID), pp. 1371–1380.
MLDMMLDM-2013-AllahSG #algorithm #array #dataset #mining #performance #scalability
An Efficient and Scalable Algorithm for Mining Maximal — High Confidence Rules from Microarray Dataset (WZAA, YKES, FFMG), pp. 352–366.
MLDMMLDM-2013-ParraL #clustering #dataset #using
Unsupervised Tagging of Spanish Lyrics Dataset Using Clustering (FLP, EL), pp. 130–143.
RecSysRecSys-2013-Aiolli #dataset #performance #recommendation #scalability
Efficient top-n recommendation for very large scale binary rated datasets (FA), pp. 273–280.
SACSAC-2013-HapfelmeierSK #dataset #incremental #linear #performance
Incremental linear model trees on massive datasets: keep it simple, keep it fast (AH, JS, SK), pp. 129–135.
HPDCHPDC-2013-SuAWMWA #dataset #distributed #using
Taming massive distributed datasets: data sampling using bitmap indices (YS, GA, JW, KM, JW, JPA), pp. 13–24.
HPDCHPDC-2013-YinLBGN #dataset #order #performance #pipes and filters #using
Efficient analytics on ordered datasets using MapReduce (JY, YL, MB, LG, AN), pp. 125–126.
DRRDRR-2012-WalkerLR #dataset #documentation #image
A synthetic document image dataset for developing and evaluating historical document processing methods (DDW, WBL, EKR).
VLDBVLDB-2012-Shirani-MehrKS #dataset #evaluation #performance #query #reachability #scalability
Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets (HSM, FBK, CS), pp. 848–859.
CHICHI-2012-FisherPDs #dataset #incremental #performance #scalability #trust #visualisation
Trust me, I’m partially right: incremental visualization lets analysts explore large datasets faster (DF, IOP, SMD, MMCS), pp. 1673–1682.
ICPRICPR-2012-BoomHHF #clustering #dataset #image #using
Supporting ground-truth annotation of image datasets using clustering (BJB, PXH, JH, RBF), pp. 1542–1545.
ICPRICPR-2012-ChenWY #clustering #dataset #graph
Centroid-based clustering for graph datasets (LC, SW, XY), pp. 2144–2147.
ICPRICPR-2012-FausserS #clustering #dataset #kernel #scalability
Clustering large datasets with kernel methods (SF, FS), pp. 501–504.
ICPRICPR-2012-MogelmoseTM #comparative #dataset #detection #evaluation #learning
Learning to detect traffic signs: Comparative evaluation of synthetic and real-world datasets (AM, MMT, TBM), pp. 3452–3455.
ICPRICPR-2012-NafchiK #dataset #image #representation
Rectangular based binary image representation: Theory, applications, and dataset introduction (HZN, HRK), pp. 190–193.
ICPRICPR-2012-TakalaP #dataset #identification #named #network #people
CMV100: A dataset for people tracking and re-identification in sparse camera networks (VT, MP), pp. 1387–1390.
ICPRICPR-2012-TanLZ #dataset
The dataset system of Economic Dispute handwritten (DSEDH) based on stroke shape and structure features (JT, JHL, XXZ), pp. 661–664.
ICPRICPR-2012-Utasi #classification #dataset
Weighted conditional mutual information based boosting for classification of imbalanced datasets (ÁU), pp. 2711–2714.
KDDKDD-2012-ShiA #dataset #mobile #recommendation
GetJar mobile application recommendations with very sparse datasets (KS, KA), pp. 204–212.
KDIRKDIR-2012-KharbatBO #algorithm #case study #dataset
A New Compaction Algorithm for LCS Rules — Breast Cancer Dataset Case Study (FK, LB, MO), pp. 382–385.
KDIRKDIR-2012-Vanetik #classification #dataset
Classification of Datasets with Frequent Itemsets is Wild (NV), pp. 386–389.
MLDMMLDM-2012-JoutsijokiJ #case study #dataset
DAGSVM vs. DAGKNN: An Experimental Case Study with Benthic Macroinvertebrate Dataset (HJ, MJ), pp. 439–453.
SIGIRSIGIR-2012-HuO #classification #dataset #using
Genre classification for million song dataset using confidence-based classifiers combination (YH, MO), pp. 1083–1084.
SACSAC-2012-GomiI #3d #dataset #image #mobile #multi #named
MINI: a 3D mobile image browser with multi-dimensional datasets (AG, TI), pp. 989–996.
HPDCHPDC-2012-HefeedaGA #approximate #clustering #dataset #distributed #scalability
Distributed approximate spectral clustering for large-scale datasets (MH, FG, WAA), pp. 223–234.
ICDARICDAR-2011-AlaeiNP #benchmark #dataset #documentation #metric #segmentation
A Benchmark Kannada Handwritten Document Dataset and Its Segmentation (AA, PN, UP), pp. 141–145.
ICDARICDAR-2011-QuiniouMSVMPM #dataset #named
HAMEX — A Handwritten and Audio Dataset of Mathematical Expressions (SQ, HM, SPS, CVG, EM, SP, SM), pp. 452–456.
SIGMODSIGMOD-2011-DuanKSU #benchmark #comparison #dataset #metric #rdf
Apples and oranges: a comparison of RDF benchmarks and real RDF datasets (SD, AK, KS, OU), pp. 145–156.
SIGMODSIGMOD-2011-KashyapP #agile #dataset #development #interface #query #xml
Rapid development of web-based query interfacesfor XML datasets with QURSED (AK, MP), pp. 1339–1342.
VLDBVLDB-2012-BarskyKWH11 #correlation #dataset #mining #scalability #taxonomy
Mining Flipping Correlations from Large Datasets with Taxonomies (MB, SK, TW, JH), pp. 370–381.
HCIOCSC-2011-AliprandiRMTM #dataset #rdf #semantics #web #wiki
Extracting Events from Wikipedia as RDF Triples Linked to Widespread Semantic Web Datasets (CA, FR, AM, MT, SM), pp. 90–99.
CIKMCIKM-2011-CachedaCFF #algorithm #analysis #dataset #nearest neighbour
Improving k-nearest neighbors algorithms: practical application of dataset analysis (FC, VC, DF, VF), pp. 2253–2256.
CIKMCIKM-2011-LiuYS #dataset #query
Subject-oriented top-k hot region queries in spatial dataset (JL, GY, HS), pp. 2409–2412.
CIKMCIKM-2011-SelvarajBSS #classification #dataset
Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset (SKS, BB, SS, SKS), pp. 653–662.
KDDKDD-2011-CordeiroTTLKF #clustering #dataset #multi #pipes and filters #scalability
Clustering very large multi-dimensional datasets with MapReduce (RLFC, CTJ, AJMT, JL, UK, CF), pp. 690–698.
SIGIRSIGIR-2011-LeeHWHS #dataset #graph #image #learning #multi #pipes and filters #scalability #using
Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce (WYL, LCH, GLW, WHH, YFS), pp. 1121–1122.
SIGMODSIGMOD-2010-WangWLWWLTXL #dataset #detection #named
MapDupReducer: detecting near duplicates over massive datasets (CW, JW, XL, WW, HW, HL, WT, JX, RL), pp. 1119–1122.
VLDBVLDB-2010-LiD #dataset #probability #ranking
Ranking Continuous Probabilistic Datasets (JL, AD), pp. 638–649.
VLDBVLDB-2010-Matsudaira #3d #biology #dataset #scalability
High-End Biological Imaging Generates Very Large 3D+ and Dynamic Datasets (PM), p. 3.
VLDBVLDB-2010-MelnikGLRSTV #analysis #dataset #interactive #named
Dremel: Interactive Analysis of Web-Scale Datasets (SM, AG, JJL, GR, SS, MT, TV), pp. 330–339.
VLDBVLDB-2010-ParameswaranGR #concept #dataset #scalability #towards #web
Towards The Web of Concepts: Extracting Concepts from Large Datasets (AGP, HGM, AR), pp. 566–577.
MSRMSR-2010-BachmannB #correlation #dataset #debugging #process #quality #re-engineering
When process data quality affects the number of bugs: Correlations in software engineering datasets (AB, AB), pp. 62–71.
WCREWCRE-2010-NguyenAH #bias #case study #dataset #debugging
A Case Study of Bias in Bug-Fix Datasets (THDN, BA, AEH), pp. 259–268.
PLDIPLDI-2010-ChenHEFPTW #dataset #optimisation
Evaluating iterative optimization across 1000 datasets (YC, YH, LE, GF, LP, OT, CW), pp. 448–459.
STOCSTOC-2010-BravermanO #dataset #independence
Measuring independence of datasets (VB, RO), pp. 271–280.
CIKMCIKM-2010-CormodeKW #algorithm #dataset #scalability #set
Set cover algorithms for very large datasets (GC, HJK, AW), pp. 479–488.
CIKMCIKM-2010-HarpaleYGHY #dataset #multi #named #performance #personalisation
CiteData: a new multi-faceted dataset for evaluating personalized search performance (AH, YY, SG, DH, ZY), pp. 549–558.
ICMLICML-2010-SyedR #dataset #identification
Unsupervised Risk Stratification in Clinical Datasets: Identifying Patients at Risk of Rare Outcomes (ZS, IR), pp. 1023–1030.
ICMLICML-2010-TanWT #dataset #feature model #learning
Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets (MT, LW, IWT), pp. 1047–1054.
ICPRICPR-2010-AlvarezSVO #dataset
Perceptual Color Texture Codebooks for Retrieving in Highly Diverse Texture Datasets (, AS, MV, XO), pp. 866–869.
ICPRICPR-2010-KimM #classification #dataset
Dense Structure Inference for Object Classification in Aerial LIDAR Dataset (EK, GGM), pp. 3049–3052.
ICPRICPR-2010-PapaCF #classification #dataset #optimisation
Optimizing Optimum-Path Forest Classification for Huge Datasets (JPP, FAMC, AXF), pp. 4162–4165.
ICPRICPR-2010-SodaI #composition #dataset #integration #learning
Decomposition Methods and Learning Approaches for Imbalanced Dataset: An Experimental Integration (PS, GI), pp. 3117–3120.
ICPRICPR-2010-StottingerZKH #dataset #evaluation
FeEval A Dataset for Evaluation of Spatio-temporal Local Features (JS, SZ, RK, AH), pp. 499–502.
RecSysRecSys-2010-PilaszyZT #dataset #feedback #matrix #performance
Fast als-based matrix factorization for explicit and implicit feedback datasets (IP, DZ, DT), pp. 71–78.
SACSAC-2010-BaralisCC #dataset #mining #persistent #scalability
A persistent HY-Tree to efficiently support itemset mining on large datasets (EB, TC, SC), pp. 1060–1064.
SACSAC-2010-LuccheseOP #dataset #generative #mining
A generative pattern model for mining binary datasets (CL, SO, RP), pp. 1109–1110.
ICDARICDAR-2009-AntonacopoulosBPP #analysis #dataset #documentation #evaluation #layout #performance
A Realistic Dataset for Performance Evaluation of Document Layout Analysis (AA, DB, CP, SP), pp. 296–300.
CHICHI-2009-LeeSRCT #dataset #named #roadmap
FacetLens: exposing trends and relationships to support sensemaking within faceted datasets (BL, GS, GGR, MC, DST), pp. 1293–1302.
CIKMCIKM-2009-BalachandranPK #clustering #configuration management #dataset #documentation
Interpretable and reconfigurable clustering of document datasets by deriving word-based rules (VB, DP, DK), pp. 1773–1776.
CIKMCIKM-2009-Muntes-MuleroN #dataset #privacy #scalability
Privacy and anonymization for very large datasets (VMM, JN), pp. 2117–2118.
CIKMCIKM-2009-StoyanovichA #clustering #dataset
Rank-aware clustering of structured datasets (JS, SAY), pp. 1429–1432.
KDDKDD-2009-DundarHBRR #case study #dataset #detection #learning #using
Learning with a non-exhaustive training dataset: a case study: detection of bacteria cultures using optical-scattering technology (MD, EDH, AKB, JPR, BR), pp. 279–288.
MLDMMLDM-2009-CelepcikayEO #dataset #using
Regional Pattern Discovery in Geo-referenced Datasets Using PCA (OUC, CFE, CO), pp. 719–733.
MLDMMLDM-2009-SegataB #dataset #performance #scalability
Fast Local Support Vector Machines for Large Datasets (NS, EB), pp. 295–310.
SACSAC-2009-OwensMR #dataset #mining
Capturing truthiness: mining truth tables in binary datasets (CCOI, TMM, NR), pp. 1467–1474.
ESEC-FSEESEC-FSE-2009-BirdBADBFD #bias #dataset #debugging
Fair and balanced?: bias in bug-fix datasets (CB, AB, EA, JD, AB, VF, PTD), pp. 121–130.
SIGMODSIGMOD-2008-PanZW #clustering #composition #dataset #matrix #named #performance #scalability
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition (FP, XZ, WW), pp. 173–184.
ICEISICEIS-DISI-2008-MuldnerML #data access #dataset #policy #xml
Succinct Access Control Policies for Published XML Datasets (TM, JKM, GL), pp. 380–385.
CIKMCIKM-2008-ChoudharyMB #dataset #evolution #on the
On quantifying changes in temporally evolving dataset (RC, SM, AB), pp. 1459–1460.
CIKMCIKM-2008-LiuLNBMG #dataset #feature model #performance #preprocessor #realtime #scalability
Real-time data pre-processing technique for efficient feature extraction in large scale datasets (YL, LVL, RSN, KB, PM, CLG), pp. 981–990.
CIKMCIKM-2008-NguyenS #analysis #correlation #dataset #performance
Fast correlation analysis on time series datasets (PN, NS), pp. 787–796.
CIKMCIKM-2008-ShawXG #approximate #dataset
Deriving non-redundant approximate association rules from hierarchical datasets (GS, YX, SG), pp. 1451–1452.
ICMLICML-2008-WolfeHK #dataset #distributed #scalability
Fully distributed EM for very large datasets (JW, AH, DK), pp. 1184–1191.
ICPRICPR-2008-WatanabeK08a #dataset #scalability
RANSAC-SVM for large-scale datasets (KW, TK), pp. 1–4.
KDDKDD-2008-DasSN #category theory #dataset #detection
Anomaly pattern detection in categorical datasets (KD, JGS, DBN), pp. 169–176.
FSEFSE-2008-OsterweilCEPWBH #dataset #experience #process #using #workflow
Experience in using a process language to define scientific workflow and generate dataset provenance (LJO, LAC, AME, RMP, AEW, ERB, JLH), pp. 319–329.
SIGMODSIGMOD-2007-XiaoT #dataset #named #privacy #towards
M-invariance: towards privacy preserving re-publication of dynamic datasets (XX, YT), pp. 689–700.
HCIHIMI-IIE-2007-CollinsNMRBCPFHP #collaboration #dataset
The Karst Collaborative Workspace for Analyzing and Annotating Scientific Datasets (LMC, DEN, MLBM, JVR, MAB, CRC, JEP, BFS, SKH, JCP), pp. 3–12.
KDDKDD-2007-DasS #category theory #dataset #detection
Detecting anomalous records in categorical datasets (KD, JGS), pp. 220–229.
PADLPADL-2007-Costa #dataset #performance #prolog
Prolog Performance on Larger Datasets (VSC), pp. 185–199.
DRRDRR-2006-ZhangA #dataset #towards
Toward quantifying the amount of style in a dataset (XZ, SA).
SIGMODSIGMOD-2006-KiferG #dataset #injection
Injecting utility into anonymized datasets (DK, JG), pp. 217–228.
VLDBVLDB-2006-GemullaLH #dataset #evolution #maintenance
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets (RG, WL, PJH), pp. 595–606.
VLDBVLDB-2006-JiTT #3d #dataset #mining
Mining Frequent Closed Cubes in 3D Datasets (LJ, KLT, AKHT), pp. 811–822.
ICPRICPR-v1-2006-AnC #dataset
Finding Rule Groups to Classify High Dimensional Gene Expression Datasets (JA, YPPC), pp. 1196–1199.
ICPRICPR-v3-2006-FelsbergG #dataset #multi #named #robust #scalability
P-Channels: Robust Multivariate M-Estimation of Large Datasets (MF, GHG), pp. 262–267.
ICPRICPR-v4-2006-Jun #comparison #dataset #detection
A Peer Dataset Comparison Outlier Detection Model Applied to Financial Surveillance (TJ), pp. 900–903.
SACSAC-2006-FangG #bound #dataset
Boundary surface extraction and rendering for volume datasets (SF, PG), pp. 1356–1360.
SACSAC-2006-NemalhabibS #algorithm #category theory #clustering #dataset #named
CLUC: a natural clustering algorithm for categorical datasets based on cohesion (AN, NS), pp. 637–638.
ICDARICDAR-2005-AgrawalBMV #dataset #named #online #representation #xml
UPX: A New XML Representation for Annotated Datasets of Online Handwriting Data (MA, KB, SM, LV), pp. 1161–1165.
ICEISICEIS-v2-2005-DoP #dataset #mining #scalability #visualisation
Mining Very Large Datasets with SVM and Visualization (TND, FP), pp. 127–141.
KDDKDD-2005-JinWPPA #dataset #graph
Discovering frequent topological structures from graph datasets (RJ, CW, DP, SP, GA), pp. 606–611.
KDDKDD-2005-ZakiPAS #algorithm #category theory #clustering #dataset #effectiveness #mining #named
CLICKS: an effective algorithm for mining subspace clusters in categorical datasets (MJZ, MP, IA, TS), pp. 736–742.
MLDMMLDM-2005-FerrandizB05a #dataset #evaluation
Supervised Evaluation of Dataset Partitions: Advantages and Practice (SF, MB), pp. 600–609.
MLDMMLDM-2005-SiaL #clustering #dataset #scalability #using
Clustering Large Dynamic Datasets Using Exemplar Points (WS, MML), pp. 163–173.
SIGMODSIGMOD-2004-CongXPTY #array #dataset #named
FARMER: Finding Interesting Rule Groups in Microarray Datasets (GC, AKHT, XX, FP, JY), pp. 143–154.
VLDBVLDB-2004-HoweM #algebra #dataset
Algebraic Manipulation of Scientific Datasets (BH, DM), pp. 924–935.
CIKMCIKM-2004-ChenL #clustering #dataset #named #scalability #visualisation
ClusterMap: labeling clusters in large datasets via visualization (KC, LL), pp. 285–293.
CIKMCIKM-2004-ChungJM #clustering #dataset #mining #using
Mining gene expression datasets using density-based clustering (SC, JJ, DM), pp. 150–151.
ICPRICPR-v4-2004-WillisC #3d #dataset #multi #symmetry
Alignment of Multiple Non-Overlapping Axially Symmetric 3D Datasets (ARW, DBC), pp. 96–99.
KDDKDD-2004-TruongLB #dataset #learning #random #using
Learning a complex metabolomic dataset using random forests and support vector machines (YT, XL, CB), pp. 835–840.
SIGIRSIGIR-2004-DavidovGM #categorisation #dataset #generative
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory (DD, EG, SM), pp. 250–257.
SACSAC-2004-AdamJA #dataset #detection
Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets (NRA, VPJ, VA), pp. 576–583.
SACSAC-2004-CarswellGN #dataset #multi #semantics #transaction
Wireless spatio-semantic transactions on multimedia datasets (JDC, KG, MN), pp. 1201–1205.
DRRDRR-2003-HauserSSDST #dataset
Correcting OCR text by association with historical datasets (SEH, JS, TFS, DDF, SS, GRT), pp. 84–93.
VLDBVLDB-2003-LinLYZ #dataset #multi #scalability
Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets (XL, QL, YY, XZ), pp. 814–825.
ICEISICEIS-v2-2003-DoP #algorithm #dataset #mining #scalability
Mining Very Large Datasets with Support Vector Machine Algorithms (TND, FP), pp. 140–147.
ICMLICML-2003-LeskovecS #dataset #linear #programming
Linear Programming Boosting for Uneven Datasets (JL, JST), pp. 456–463.
ICMLICML-2003-ZhuWC #dataset #scalability
Eliminating Class Noise in Large Datasets (XZ, XW, QC), pp. 920–927.
KDDKDD-2003-El-HajjZ #dataset #interactive #matrix #mining #performance #scalability
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining (MEH, ORZ), pp. 109–118.
KDDKDD-2003-KoyuturkG #dataset #framework #named
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets (MK, AG), pp. 147–156.
KDDKDD-2003-PanCTYZ #biology #dataset #named
Carpenter: finding closed patterns in long biological datasets (FP, GC, AKHT, JY, MJZ), pp. 637–642.
KDDKDD-2003-PeterCG #algorithm #clustering #dataset #scalability
New unsupervised clustering algorithm for large datasets (WP, JC, CG), pp. 643–648.
MLDMMLDM-2003-LeleuRBE #dataset #mining #named
GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions (ML, CR, JFB, GE), pp. 293–306.
SACSAC-2003-ChenL #clustering #dataset #visualisation
Cluster Rendering of Skewed Datasets via Visualization (KC, LL), pp. 909–916.
CIKMCIKM-2002-ZhaoK #algorithm #clustering #dataset #documentation #evaluation
Evaluation of hierarchical clustering algorithms for document datasets (YZ, GK), pp. 515–524.
KDDKDD-2002-RidgewayM #analysis #dataset
Bayesian analysis of massive datasets via particle filters (GR, DM), pp. 5–13.
KDDKDD-2002-TantrumMS #clustering #dataset #modelling #scalability
Hierarchical model-based clustering of large datasets through fractionation and refractionation (JT, AM, WS), pp. 183–190.
ICEISICEIS-v1-2001-KotsisWFM #dataset #multi #novel #visualisation
Novel Data Visualisation and Exploration in Multidimensional Datasets (NK, GRSW, JDF, DRM), pp. 170–175.
ICMLICML-2001-DomeniconiG #approach #approximate #classification #dataset #multi #nearest neighbour #performance #query #scalability
An Efficient Approach for Approximating Multi-dimensional Range Queries and Nearest Neighbor Classification in Large Datasets (CD, DG), pp. 98–105.
KDDKDD-2001-BeygelzimerPM #category theory #dataset #performance #scalability #visualisation
Fast ordering of large categorical datasets for better visualization (AB, CSP, SM), pp. 239–244.
KDDKDD-2000-BarbaraC #clustering #dataset #using
Using the fractal dimension to cluster datasets (DB, PC), pp. 260–264.
KDDKDD-2000-Yang #3d #dataset #interactive #relational #scalability
Interactive exploration of very large relational datasets through 3D dynamic projections (LY), pp. 236–243.
KDDKDD-2000-ZhangDR #constraints #dataset #scalability
Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets (XZ, GD, KR), pp. 310–314.
SIGMODSIGMOD-1999-MankuRL #dataset #online #order #performance #random #scalability #statistics
Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets (GSM, SR, BGL), pp. 251–262.
KDDKDD-1999-DaviesM #dataset #network
Bayesian Networks for Lossless Dataset Compression (SD, AWM), pp. 387–391.
VLDBVLDB-1998-GehrkeRG #dataset #framework #named #performance #scalability
RainForest — A Framework for Fast Decision Tree Construction of Large Datasets (JG, RR, VG), pp. 416–427.
VLDBVLDB-1998-KnorrN #algorithm #dataset #mining #scalability
Algorithms for Mining Distance-Based Outliers in Large Datasets (EMK, RTN), pp. 392–403.
VLDBVLDB-1998-ShuklaDN #dataset #multi
Materialized View Selection for Multidimensional Datasets (AS, PD, JFN), pp. 488–499.
KDDKDD-1998-AlsabtiRS #classification #dataset #named #scalability
CLOUDS: A Decision Tree Classifier for Large Datasets (KA, SR, VS), pp. 2–8.
KDDKDD-1998-OatesJ #dataset #modelling #scalability
Large Datasets Lead to Overly Complex Models: An Explanation and a Solution (TO, DJ), pp. 294–298.
SIGMODSIGMOD-1997-KornJF #ad hoc #dataset #query #scalability #sequence
Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences (FK, HVJ, CF), pp. 289–300.
SIGMODSIGMOD-1997-LivnyRBCDLMW #dataset #named #query #scalability #visualisation
DEVise: Integrated Querying and Visualization of Large Datasets (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 301–312.
SIGMODSIGMOD-1997-LivnyRBCDLMW97a #dataset #named #query #scalability #visual notation
DEVise: Integrated Querying and Visual Exploration of Large Datasets (Demo Abstract) (ML, RR, KSB, GC, DD, SL, JM, RKW), pp. 517–520.
KDDKDD-1997-ZupanBBC #approach #composition #data mining #dataset #mining
A Dataset Decomposition Approach to Data Mining and Machine Discovery (BZ, MB, IB, BC), pp. 299–302.
SIGMODSIGMOD-1995-FaloutsosL #algorithm #dataset #multi #named #performance #visualisation
FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets (CF, KIL), pp. 163–174.
KDDKDD-1995-StolorzNMMSSYNCMF #data mining #dataset #mining #performance #scalability
Fast Spatio-Temporal Data Mining of Large Geophysical Datasets (PES, HN, EM, RRM, ECS, JRS, JY, KWN, SYC, CRM, JDF), pp. 300–305.

Bibliography of Software Language Engineering in Generated Hypertext (BibSLEIGH) is created and maintained by Dr. Vadim Zaytsev.
Hosted as a part of SLEBOK on GitHub.