## Lise Getoor, Ted E. Senator, Pedro M. Domingos, Christos Faloutsos

*Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining*

KDD, 2003.

### Contents (96 items)

- KDD-2003-Gray #online #prototype
- On-line science: the world-wide telescope as a prototype for the new computational science (JG), p. 3.
- KDD-2003-Koller #learning #relational #statistics
- Statistical learning from relational data (DK), p. 4.
- KDD-2003-Weigend #behaviour
- Analyzing customer behavior at Amazon.com (ASW), p. 5.
- KDD-2003-Aggarwal #data mining #design #distance #mining #towards
- Towards systematic design of distance functions for data mining applications (CCA), pp. 9–18.
- KDD-2003-BanerjeeDGS #clustering #generative #modelling
- Generative model-based clustering of directional data (AB, ISD, JG, SS), pp. 19–28.
- KDD-2003-BayS #linear #mining
- Mining distance-based outliers in near linear time with randomization and a simple pruning rule (SDB, MS), pp. 29–38.
- KDD-2003-BilenkoM #adaptation #detection #metric #similarity #string #using
- Adaptive duplicate detection using learnable string similarity measures (MB, RJM), pp. 39–48.
- KDD-2003-BoltonA
- An iterative hypothesis-testing strategy for pattern discovery (RJB, NMA), pp. 49–58.
- KDD-2003-BronnimannCDHS #performance #reduction
- Efficient data reduction with EASE (HB, BC, MD, PJH, PS), pp. 59–68.
- KDD-2003-CasaliCL #semantics #using
- Extracting semantics from data cubes using cube transversals and closures (AC, RC, LL), pp. 69–78.
- KDD-2003-ChudovaGMS #clustering #invariant #modelling
- Translation-invariant mixture models for curve clustering (DC, SG, EM, PS), pp. 79–88.
- KDD-2003-DhillonMM #clustering
- Information-theoretic co-clustering (ISD, SM, DSM), pp. 89–98.
- KDD-2003-EirinakiVV #named #personalisation #process #semantics #taxonomy #using #web
- SEWeP: using site semantics and a taxonomy to enhance the Web personalization process (ME, MV, IV), pp. 99–108.
- KDD-2003-El-HajjZ #dataset #interactive #matrix #mining #performance #scalability
- Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining (MEH, ORZ), pp. 109–118.
- KDD-2003-EtzioniTKY #mining
- To buy or not to buy: mining airfare data to minimize ticket purchase price (OE, RT, CAK, AY), pp. 119–128.
- KDD-2003-GionisKM #order
- Fragments of order (AG, TK, HM), pp. 129–136.
- KDD-2003-KempeKT #network #social
- Maximizing the spread of influence through a social network (DK, JMK, ÉT), pp. 137–146.
- KDD-2003-KoyuturkG #dataset #framework #named
- PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets (MK, AG), pp. 147–156.
- KDD-2003-PampalkGW #feature model #visualisation
- Visualizing changes in the structure of data for exploratory feature selection (EP, WG, GW), pp. 157–166.
- KDD-2003-PerlichP #concept #relational
- Aggregation-based feature invention and relational concept classes (CP, FJP), pp. 167–176.
- KDD-2003-SarawagiCG #learning #named #probability #topic
- Cross-training: learning probabilistic mappings between topics (SS, SC, SG), pp. 177–186.
- KDD-2003-SripadaRHY #generative #summary #using
- Generating English summaries of time series data using the Gricean maxims (SS, ER, JH, JY), pp. 187–196.
- KDD-2003-TantrumMS #assessment #clustering #modelling
- Assessment and pruning of hierarchical model based clustering (JT, AM, WS), pp. 197–205.
- KDD-2003-VaidyaC #clustering #privacy
- Privacy-preserving k-means clustering over vertically partitioned data (JV, CC), pp. 206–215.
- KDD-2003-VlachosHGK #distance #metric #multi
- Indexing multi-dimensional time-series with support for multiple distance measures (MV, MH, DG, EJK), pp. 216–225.
- KDD-2003-WangFYH #classification #concept #data type #mining #using
- Mining concept-drifting data streams using ensemble classifiers (HW, WF, PSY, JH), pp. 226–235.
- KDD-2003-WangHP #mining
- CLOSET+: searching for the best strategies for mining frequent closed itemsets (JW, JH, JP), pp. 236–245.
- KDD-2003-WangJL #mining
- Mining unexpected rules by pushing user dynamics (KW, YJ, LVSL), pp. 246–255.
- KDD-2003-WebbBN #detection #difference #on the
- On detecting differences between groups (GIW, SMB, DAN), pp. 256–265.
- KDD-2003-WhiteS #algorithm #network
- Algorithms for estimating relative importance in networks (SW, PS), pp. 266–275.
- KDD-2003-WuBY #modelling #multi
- Screening and interpreting multi-item associations based on log-linear modeling (XW, DB, YY), pp. 276–285.
- KDD-2003-YanH #graph #mining #named
- CloseGraph: mining closed frequent graph patterns (XY, JH), pp. 286–295.
- KDD-2003-YiLL #data mining #mining #web
- Eliminating noisy information in Web pages for data mining (LY, BL, XL), pp. 296–305.
- KDD-2003-YuYH #clustering #scalability #set #using
- Classifying large data sets using SVMs with hierarchical clusters (HY, JY, JH), pp. 306–315.
- KDD-2003-ZakiA #classification #effectiveness #named #xml
- XRules: an effective structural classifier for XML data (MJZ, CCA), pp. 316–325.
- KDD-2003-ZakiG #mining #performance #using
- Fast vertical mining using diffsets (MJZ, KG), pp. 326–335.
- KDD-2003-ZhuS #data type #detection #performance
- Efficient elastic burst detection in data streams (YZ, DS), pp. 336–345.
- KDD-2003-AliK #clustering #divide and conquer #using #web
- Golden Path Analyzer: using divide-and-conquer to cluster Web clickstreams (KA, SPK), pp. 349–358.
- KDD-2003-FramAD #data mining #empirical #mining #safety
- Empirical Bayesian data mining for discovering patterns in post-marketing drug safety (DMF, JSA, WD), pp. 359–368.
- KDD-2003-HoNKLNYT #abstraction #mining
- Mining hepatitis data with temporal abstraction (TBH, TDN, SK, SQL, DN, HY, KT), pp. 369–377.
- KDD-2003-JensenRB #assessment
- Information awareness: a prospective technical assessment (DJ, MJR, HB), pp. 378–387.
- KDD-2003-LastFK #approach #automation #data mining #mining #testing
- The data mining approach to automated software testing (ML, MF, AK), pp. 388–396.
- KDD-2003-LawrenceHC #modelling #predict
- Passenger-based predictive modeling of airline no-show rates (RDL, SJH, JC), pp. 397–406.
- KDD-2003-Piatetsky-ShapiroKR #array #data analysis
- Capturing best practice for microarray gene expression data analysis (GPS, TK, SR), pp. 407–415.
- KDD-2003-RaoSNGR #analysis
- Clinical and financial outcomes analysis with existing hospital patient records (RBR, SS, RSN, CG, HR), pp. 416–425.
- KDD-2003-SahooORGMMVS #clustering #predict #scalability
- Critical event prediction for proactive management in large-scale computer clusters (RKS, AJO, IR, MG, JEM, SM, RV, AS), pp. 426–435.
- KDD-2003-SheCWEGB #predict
- Frequent-subsequence-based prediction of outer membrane proteins (RS, FC, KW, ME, JLG, FSLB), pp. 436–445.
- KDD-2003-SteinbachTKKP #clustering #using
- Discovery of climate indices using clustering (MS, PNT, VK, SAK, CP), pp. 446–455.
- KDD-2003-WeissBKD #data mining #knowledge-based #mining
- Knowledge-based data mining (SMW, SJB, SK, SD), pp. 456–461.
- KDD-2003-WuGLYC #multimodal
- The anatomy of a multimodal information filter (YLW, KG, BL, HY, EYC), pp. 462–471.
- KDD-2003-ArgamonSS #mining #multi
- Style mining of electronic messages for multiple authorship discrimination: first results (SA, MS, SSS), pp. 475–480.
- KDD-2003-BhatnagarKN #classification #mining
- Mining high dimensional data for classifier knowledge (RB, GK, WN), pp. 481–486.
- KDD-2003-ChangL #adaptation #data type #online
- Finding recent frequent itemsets adaptively over online data streams (JHC, WSL), pp. 487–492.
- KDD-2003-ChiuKL #probability
- Probabilistic discovery of time series motifs (BYcC, EJK, SL), pp. 493–498.
- KDD-2003-CohenWM #comprehension
- Understanding captions in biomedical publications (WWC, RCW, RFM), pp. 499–504.
- KDD-2003-DuZ #data mining #mining #privacy #random #using
- Using randomized response techniques for privacy-preserving data mining (WD, JZZ), pp. 505–510.
- KDD-2003-DuMouchelA #design
- Applications of sampling and fractional factorial designs to model-free data squashing (WD, DKA), pp. 511–516.
- KDD-2003-FradkinM #machine learning #random
- Experiments with random projections for machine learning (DF, DM), pp. 517–522.
- KDD-2003-GamaRM #data type #mining #performance
- Accurate decision trees for mining high-speed data streams (JG, RR, PM), pp. 523–528.
- KDD-2003-GuhaGK #correlation #data type
- Correlating synchronous and asynchronous data streams (SG, DG, NK), pp. 529–534.
- KDD-2003-GunduzO #behaviour #modelling #predict #representation #web
- A Web page prediction model based on click-stream tree representation of user behavior (SG, MTÖ), pp. 535–540.
- KDD-2003-HopcroftKKS #community #network #scalability
- Natural communities in large linked networks (JEH, OK, BK, BS), pp. 541–546.
- KDD-2003-Houle #clustering #navigation #set
- Navigating massive data sets via local clustering (MEH), pp. 547–552.
- KDD-2003-HsuDL #database #image #mining
- Mining viewpoint patterns in image databases (WH, JD, MLL), pp. 553–558.
- KDD-2003-Jermaine #correlation #game studies
- Playing hide-and-seek with correlations (CJ), pp. 559–564.
- KDD-2003-JiangPZ #interactive
- Interactive exploration of coherent patterns in time-series gene expression data (DJ, JP, AZ), pp. 565–570.
- KDD-2003-JinA #performance #streaming
- Efficient decision tree construction on streaming data (RJ, GA), pp. 571–576.
- KDD-2003-JoshiAKN #documentation #similarity #web
- A bag of paths model for measuring structural similarity in Web documents (SJ, NA, RK, SN), pp. 577–582.
- KDD-2003-Kamishima #collaboration #order #recommendation
- Nantonac collaborative filtering: recommendation based on order responses (TK), pp. 583–588.
- KDD-2003-KorenH #clustering #visualisation
- A two-way visualization method for clustered data (YK, DH), pp. 589–594.
- KDD-2003-LeungP #empirical
- Empirical comparisons of various voting methods in bagging (KTL, DSPJ), pp. 595–600.
- KDD-2003-LiuGZ #mining #web
- Mining data records in Web pages (BL, RLG, YZ), pp. 601–606.
- KDD-2003-LiuLLY #on the #query
- On computing, storing and querying frequent patterns (GL, HL, WL, JXY), pp. 607–612.
- KDD-2003-MaP #detection #online #sequence
- Online novelty detection on temporal sequences (JM, SP), pp. 613–618.
- KDD-2003-MorinagaYT #distributed #mining
- Distributed cooperative mining for information consortia (SM, KY, JiT), pp. 619–624.
- KDD-2003-NevilleJFH #learning #probability #relational
- Learning relational probability trees (JN, DJ, LF, MH), pp. 625–630.
- KDD-2003-NobleC #detection #graph
- Graph-based anomaly detection (CCN, DJC), pp. 631–636.
- KDD-2003-PanCTYZ #biology #dataset #named
- Carpenter: finding closed patterns in long biological datasets (FP, GC, AKHT, JY, MJZ), pp. 637–642.
- KDD-2003-PeterCG #algorithm #clustering #dataset #scalability
- New unsupervised clustering algorithm for large datasets (WP, JC, CG), pp. 643–648.
- KDD-2003-SequeiraZSC #data mining #locality #mining #source code
- Improving spatial locality of programs via data mining (KS, MJZ, BKS, CDC), pp. 649–654.
- KDD-2003-TangZP #mining
- Mining phenotypes and informative genes from gene expression data (CT, AZ, JP), pp. 655–660.
- KDD-2003-TaoMF #framework #mining #using
- Weighted Association Rule Mining using weighted support and significance framework (FT, FM, MMF), pp. 661–666.
- KDD-2003-TeohM #interactive #named #visualisation
- PaintingClass: interactive construction, visualization and exploration of decision trees (STT, KLM), pp. 667–672.
- KDD-2003-TsamardinosAS #markov #performance
- Time and sample efficient discovery of Markov blankets and direct causal relations (IT, CFA, ARS), pp. 673–678.
- KDD-2003-YuC #distributed #multi
- Distributed multivariate regression based on influential observations (HY, ECC), pp. 679–684.
- KDD-2003-YuL
- Efficiently handling feature redundancy in high-dimensional data (LY, HL), pp. 685–690.
- KDD-2003-AlonsoBLB #adaptation #nearest neighbour
- An adaptive nearest neighbor search for a parts acquisition ePortal (RA, JAB, HL, CB), pp. 693–698.
- KDD-2003-BarryZM #architecture #information management #simulation
- Architecting a knowledge discovery engine for military commanders utilizing massive runs of simulations (PSB, JZ, MM), pp. 699–704.
- KDD-2003-DasuVW #information management #quality
- Data quality through knowledge engineering (TD, GTV, JRW), pp. 705–710.
- KDD-2003-LauLW #analysis #similarity
- Similarity analysis on government regulations (GTL, KHL, GW), pp. 711–716.
- KDD-2003-MayerS #design
- Experimental design for solicitation campaigns (UFM, AS), pp. 717–722.
- KDD-2003-OteyPGLNP #detection #towards
- Towards NIC-based intrusion detection (MEO, SP, AG, GL, SN, DKP), pp. 723–728.
- KDD-2003-PerngTGMH #data-driven #network #validation
- Data-driven validation, completion and construction of event relationship networks (CSP, DT, GG, SM, JLH), pp. 729–734.
- KDD-2003-PrattT #concept #visualisation
- Visualizing concept drift (KBP, GT), pp. 735–740.
- KDD-2003-ShimazuMF #case study
- Experimental study of discovering essential information from customer inquiry (KS, AM, KF), pp. 741–746.
- KDD-2003-ZhangSY #data mining #mining
- Applying data mining in investigating money laundering crimes (Z(Z, JJS, PSY), pp. 747–752.

