Tag #corpus
140 papers:
EDM-2019-SinclairMLLG - Tutorbot Corpus: Evidence of Human-Agent Verbal Alignment in Second Language Learner Dialogues (AS, KM, CGL, AL, DG).
ICSME-2019-WongSCH #fault #stack overflow #syntax- Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes (AWW, AS, SAC, AH), pp. 318–322.
JCDL-2018-WillkommSSSB #algebra #query- A Query Algebra for Temporal Text Corpora (JW, CSP, MS, MS, KB), pp. 183–192.
EDM-2018-CaiGWCSH #evaluation #wiki- Impact of Corpus Size and Dimensionality of LSA Spaces from Wikipedia Articles on AutoTutor Answer Evaluation (ZC, AG, LW, QC, DWS, XH).
SANER-2018-MoverSOC #framework #graph #mining- Mining framework usage graphs from app corpora (SM, SS0, RBPO, BYEC), pp. 277–289.
CIKM-2018-0001RHZGHJL0 #profiling- Open-Schema Event Profiling for Massive News Corpora (QY0, XR, WH, CZ0, XG, LH, HJ, CYL, JH0), pp. 587–596.
CIKM-2018-RepkeKEHHKSSZ #email #interactive #scalability- Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora (TR, RK, JE, MH, JH, DK, HS, NS, AZ), pp. 1871–1874.
CIKM-2018-XuLHLXYW #multi #named #type system- METIC: Multi-Instance Entity Typing from Corpus (BX, ZL, LH, BL, YX, DY, WW0), pp. 903–912.
KDD-2018-StaarDAB #documentation #framework #machine learning #platform #scalability- Corpus Conversion Service: A Machine Learning Platform to Ingest Documents at Scale (PWJS, MD, CA, CB), pp. 774–782.
JCDL-2017-BuchananM #library- The Lowest Form of Flattery: Characterising Text Re-Use and Plagiarism Patterns in a Digital Library Corpus (GB, DM), pp. 159–168.
ECIR-2017-CeroniGF #constraints #crowdsourcing #named #strict #validation- JustEvents: A Crowdsourced Corpus for Event Validation with Strict Temporal Constraints (AC, UG, MF), pp. 484–492.
KDD-2017-JiangSCRKH0 #named- MetaPAD: Meta Pattern Discovery from Massive Text Corpora (MJ0, JS, TC, XR, LMK, TPH, JH0), pp. 877–886.
JCDL-2016-TraubSOHVH #assessment #bias #scalability- Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus (MCT, TS, JvO, JH, APdV, LH), pp. 7–16.
ICSME-2016-Ragkhitwetsagul #similarity- Measuring Code Similarity in Large-Scaled Code Corpora (CR), pp. 626–630.
MSR-2016-ChowdhuryH #energy #metric #named- GreenOracle: estimating software energy consumption with energy measurement corpora (SAC, AH), pp. 49–60.
CIG-2016-Nelson #game studies #scalability- Investigating vanilla MCTS scaling on the GVG-AI game corpus (MJN), pp. 1–7.
CIKM-2016-SiddiquiRPH #documentation #named #scalability- FacetGist: Collective Extraction of Document Facets in Large Technical Corpora (TS, XR, AGP, JH0), pp. 871–880.
Onward-2016-ZilbersteinY #natural language #similarity- Leveraging a corpus of natural language descriptions for program similarity (MZ, EY), pp. 197–211.
HT-2015-Bayomi #adaptation #framework #reuse #using- A Framework to Provide Customized Reuse of Open Corpus Content for Adaptive Systems (MB), pp. 315–318.
SIGMOD-2015-LiuSWRH #mining #quality- Mining Quality Phrases from Massive Text Corpora (JL, JS, CW, XR, JH), pp. 1729–1744.
VLDB-2015-HeGC #named #semantics #using- SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora (YH, KG, XC), pp. 1358–1369.
ICSME-2015-WangPV #mining #scalability- Developing a model of loop actions by mining loop characteristics from a large code corpus (XW, LLP, KVS), pp. 51–60.
MSR-2015-BarikLSSM #named #spreadsheet- Fuse: A Reproducible, Extendable, Internet-Scale Corpus of Spreadsheets (TB, KL, JS, JS, ERMH), pp. 486–489.
CHI-2015-WoltersKMDM #design #interface- The CADENCE Corpus: A New Resource for Inclusive Voice Interface Design (MKW, JK, SEM, MD, JDM), pp. 3963–3966.
ECIR-2015-CarrascoMMSRE - Linguistically-Enhanced Search over an Open Diachronic Corpus (RCC, IMS, EMG, FSM, GCR, MPEE), pp. 801–804.
ECIR-2015-HagenWS #topic #web- A Corpus of Realistic Known-Item Topics with Associated Web Pages in the ClueWeb09 (MH, DW, BS), pp. 513–525.
KDD-2015-RenEWH #approach #automation #mining #network #recognition #type system- Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach (XR, AEK, CW, JH), pp. 2319–2320.
SIGIR-2015-ChakrabortyGP #retrieval- Retrieval from Noisy E-Discovery Corpus in the Absence of Training Data (AC, KG, SKP), pp. 755–758.
SIGIR-2015-HeindorfPSE #analysis #detection #knowledge base #towards- Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis (SH, MP, BS, GE), pp. 831–834.
JCDL-2014-CrawfordFLP #linked data #music #open data- Explorations in Linked Data practice for early music corpora (TC, BF, DL, KRP), pp. 309–312.
VLDB-2015-El-KishkySWVH14 #mining #scalability #topic- Scalable Topical Phrase Mining from Text Corpora (AEK, YS, CW, CRV, JH), pp. 305–316.
ICSME-2014-LandmanSV #analysis #empirical #java #scalability- Empirical Analysis of the Relationship between CC and SLOC in a Large Corpus of Java Methods (DL, AS, JJV), pp. 221–230.
SCAM-2014-CaraccioloCSL #multi #named- Pangea: A Workbench for Statically Analyzing Multi-language Software Corpora (AC, AC, BS, ML), pp. 71–76.
HCI-AIMT-2014-JohnsonMV #user interface- Harmonic Navigator: An Innovative, Gesture-Driven User Interface for Exploring Harmonic Spaces in Musical Corpora (DJ, BZM, YV), pp. 58–68.
HCI-AS-2014-JiaNBBT #framework #named #online #platform #research- CORPUS: Next-Generation Online Platform for Research Collaborations in Humanities (YJ, XN, RB, DB, ADT), pp. 3–12.
HIMI-DE-2014-KobayashiS #topic- Finding Division Points for Time-Series Corpus Based on Topic Changes (HK, RS), pp. 364–372.
CIKM-2014-MukherjeeAJ #framework #ontology- Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus (SM, JA, SJ), pp. 929–938.
ECIR-2014-BauerCRG #formal method #learning #web- Learning a Theory of Marriage (and Other Relations) from a Web Corpus (SB, SC, LR, TG), pp. 591–597.
SIGIR-2014-TongWZ #taxonomy- Principled dictionary pruning for low-memory corpus compression (JT, AW, JZ), pp. 283–292.
OOPSLA-2014-HsiaoCN #program analysis #statistics #using #web- Using web corpus statistics for program analysis (CHH, MJC, SN), pp. 49–65.
FSE-2014-Nguyen0NR #api #mining #scalability- Mining preconditions of APIs in large-scale code corpus (HAN, RD, TNN, HR), pp. 166–177.
JCDL-2013-BeckerD #benchmark #metric #modelling #set #using- Free benchmark corpora for preservation experiments: using model-driven engineering to generate data sets (CB, KD), pp. 349–358.
JCDL-2013-Organisciak #clustering- Addressing diverse corpora with cluster-based term weighting (PO), pp. 163–166.
TPDL-2013-TranZBNK #analysis #topic- Topic Cropping: Leveraging Latent Topics for the Analysis of Small Corpora (NKT, SZ, KB, CN, RK), pp. 297–308.
ICSM-2013-DasguptaGMDP #automation #documentation #traceability- Enhancing Software Traceability by Automatically Expanding Corpora with Relevant Documentation (TD, MG, EM, BD, DP), pp. 320–329.
CIKM-2013-McMinnMJ #detection #scalability #twitter- Building a large-scale corpus for evaluating event detection on twitter (AJM, YM, JMJ), pp. 409–418.
CIKM-2013-Zhang0D #mining #query- Mining a search engine’s corpus without a query pool (MZ, NZ, GD), pp. 29–38.
ECIR-2013-RahimiS #approach #modelling- A Language Modeling Approach for Extracting Translation Knowledge from Comparable Corpora (RR, AS), pp. 606–617.
ICML-c2-2013-KimVS #approximate #modelling #topic- A Variational Approximation for Topic Modeling of Hierarchical Corpora (DkK, GMV, LKS), pp. 55–63.
SIGIR-2013-LiangR #enterprise #information management- Finding knowledgeable groups in enterprise corpora (SL, MdR), pp. 1005–1008.
DocEng-2012-WidlocherM #framework #mining #platform- The Glozz platform: a corpus annotation and mining tool (AW, YM), pp. 171–180.
HT-2012-OKeeffeOCLW #adaptation #hypermedia #modelling #semantics #web- Linked open corpus models, leveraging the semantic web for adaptive hypermedia (IO, AO, PC, SL, VW), pp. 321–322.
SIGMOD-2012-SliwkanichSYHB #scalability #summary #towards #visualisation- Towards scalable summarization and visualization of large text corpora (TS, DS, AY, MH, DB), p. 863.
ITiCSE-2012-PoonSTK #detection #source code- Instructor-centric source code plagiarism detection and plagiarism corpus (JYHP, KS, YFT, MYK), pp. 122–127.
CIG-2012-SwansonEJ #composition #learning #visual notation- Learning visual composition preferences from an annotated corpus generated through gameplay (RS, DE, AJ), pp. 363–370.
CIKM-2012-LiCLP #ontology #performance- Efficient extraction of ontologies from domain specific text corpora (TL, PC, LVSL, RP), pp. 1537–1541.
CIKM-2012-LiLJWZH #parallel- Joint bilingual name tagging for parallel corpora (QL, HL, HJ, WW, JZ, FH), pp. 1727–1731.
CIKM-2012-SiposSSJ #summary #using #word- Temporal corpus summarization using submodular word coverage (RS, AS, PS, TJ), pp. 754–763.
CIKM-2012-XiangFWHR #detection #scalability #topic #twitter- Detecting offensive tweets via topical feature discovery over a large scale twitter corpus (GX, BF, LW, JIH, CPR), pp. 1980–1984.
ECIR-2012-TholpadiDBS #clustering #multi #using- Cluster Labeling for Multilingual Scatter/Gather Using Comparable Corpora (GT, MKD, CB, SKS), pp. 388–400.
MLDM-2012-WangYL - Measuring the Dynamic Relatedness between Chinese Entities Orienting to News Corpus (ZW, JY, XL), pp. 631–644.
SIGIR-2012-CastelliRFHLR - Distilling and exploring nuggets from a corpus (VC, HR, RF, DJH, XL, SR), p. 1006.
SIGIR-2012-McCreadieSLMOM #on the #reuse #twitter- On building a reusable Twitter corpus (RM, IS, JL, CM, IO, DM), pp. 1113–1114.
SIGIR-2012-PotthastHSGMTW #named- ChatNoir: a search engine for the ClueWeb09 corpus (MP, MH, BS, JG, MM, MT, CW), p. 1004.
DRR-2011-EynardME #framework #navigation- A framework to improve digital corpus uses: image-mode navigation (LE, VM, HE), pp. 1–10.
ICDAR-2011-ChattopadhyaySSR #analysis- Creation and Analysis of a Corpus of Text Rich Indian TV Videos (TC, SS, AS, NR), pp. 849–853.
SIGMOD-2011-ZhangZD #estimation #mining #performance- Mining a search engine’s corpus: efficient yet unbiased sampling and aggregate estimation (MZ, NZ, GD), pp. 793–804.
TPDL-2011-DeclerckCMRB #framework- A Text Technology Infrastructure for Annotating Corpora in the eHumanities (TD, UC, KM, CR, GB), pp. 457–460.
CIKM-2011-JacobDG #classification #multi #social #using- Classification and annotation in social corpora using multiple relations (YJ, LD, PG), pp. 1215–1220.
CIKM-2011-KimJHSZ #approach #graph #mining- Mining entity translations from comparable corpora: a holistic graph mapping approach (JK, LJ, SwH, YIS, MZ), pp. 1295–1304.
CIKM-2011-VarolCAK #detection #named- CoDet: sentence-based containment detection in news corpora (EV, FC, CA, OK), pp. 2049–2052.
CIKM-2011-YamamotoNT11a #community- Extracting adjective facets from community Q&A corpus (TY, SN, KT), pp. 2021–2024.
CIKM-2011-YeungI #generative #multi- Extracting multi-dimensional relations: a generative model of groups of entities in a corpus (CmAY, TI), pp. 1203–1208.
SIGIR-2011-HoobinPZ - Sample selection for dictionary-based corpus compression (CH, SJP, JZ), pp. 1137–1138.
SAC-2011-YelogluMZ #multi #summary- Multi-document summarization of scientific corpora (OY, EEM, ANZH), pp. 252–258.
SIGMOD-2010-SiferLWB #keyword #multi #summary- Integrating keyword search with multiple dimension tree views over a summary corpus data cube (MS, JL, YW, SB), pp. 1167–1170.
ICEIS-HCI-2010-KralC #automation #web- Automatic Dialog Act Corpus Creation from Web Pages (PK, CC), pp. 198–203.
CIKM-2010-AjmeraKLMP #parallel #web- Alignment of short length parallel corpora with an application to web search (JA, HSK, KPL, SM, MP), pp. 1477–1480.
ECIR-2010-JagarlamudiD #multi #topic- Extracting Multilingual Topics from Unaligned Comparable Corpora (JJ, HDI), pp. 444–456.
ICPR-2010-ElnakibECS #analysis- Dyslexia Diagnostics by Centerline-Based Shape Analysis of the Corpus Callosum (AE, AEB, MC, AES), pp. 261–264.
ICPR-2010-PastorTCV - A Bi-modal Handwritten Text Corpus: Baseline Results (MP, AHT, FC, EV), pp. 1933–1936.
ICPR-2010-RomeroTV #analysis #image- Computer Assisted Transcription of Text Images: Results on the GERMANA Corpus and Analysis of Improvements Needed for Practical Use (VR, AHT, EV), pp. 2017–2020.
KDD-2010-ZhangSZL #correlation #multi #process- Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora (JZ, YS, CZ, SL), pp. 1079–1088.
SIGIR-2010-Potthast #crowdsourcing #wiki- Crowdsourcing a wikipedia vandalism corpus (MP), pp. 789–790.
HT-2009-SteichenLOW #generative #hypermedia #reuse- Dynamic hypertext generation for reusing open corpus content (BS, SL, AO, VW), pp. 119–128.
ICDAR-2009-Martin-ToralAP #detection #documentation- Detection of Incoherences in a Document Corpus Based on the Application of a Neuro-Fuzzy System (SMT, VA, GISP), pp. 1101–1105.
HCD-2009-NakanoR #analysis #multimodal #usability- Multimodal Corpus Analysis as a Method for Ensuring Cultural Usability of Embodied Conversational Agents (YIN, MR), pp. 521–530.
SAC-2009-LiuMYGF #mining #probability- A sentence level probabilistic model for evolutionary theme pattern mining from news corpora (SL, YM, WGY, NG, OF), pp. 1742–1747.
HT-2008-LawlessHW #education #learning- Enhancing access to open corpus educational content: learning in the wild (SL, LH, VW), pp. 167–174.
ICEIS-AIDSS-2008-Martin-ToralSD #detection #documentation- Detection of Incoherences in a Technical and Normative Document Corpus (SMT, GISP, YAD), pp. 282–287.
CIKM-2008-CustisA #query #statistics- Investigating external corpus and clickthrough statistics for query expansion in the legal domain (TC, KAK), pp. 1363–1364.
CIKM-2008-RogatiYC #information retrieval #optimisation- Corpus microsurgery: criteria optimization for medical cross-language ir (MR, YY, JGC), pp. 1365–1366.
CIKM-2008-UdupaSKJ #mining- Mining named entity transliteration equivalents from comparable corpora (RU, KS, AK, JJ), pp. 1423–1424.
ECIR-2008-AyacheQ #learning #using #video- Video Corpus Annotation Using Active Learning (SA, GQ), pp. 187–198.
ECIR-2008-ChoudharyMBB #evolution #interactive #towards- Towards Characterization of Actor Evolution and Interactions in News Corpora (RC, SM, AB, RB), pp. 422–429.
ECIR-2008-Talvensaari #quality- Effects of Aligned Corpus Quality and Size in Corpus-Based CLIR (TT), pp. 114–125.
SIGIR-2008-Banerjee #classification #modelling #topic #using- Improving text classification accuracy using topic modeling over an additional corpus (SB), pp. 867–868.
DRR-2007-HeD #adaptation #clustering #retrieval- Combining text clustering and retrieval for corpus adaptation (FH, XD).
ICDAR-2007-VargasFTA - Off-line Handwritten Signature GPDS-960 Corpus (FV, MAF, CMT, JBA), pp. 764–768.
JCDL-2007-StewartCB #generative #mining #scalability- A new generation of textual corpora: mining corpora from very large collections (GS, GRC, AB), pp. 356–365.
ITiCSE-2007-TremblayMSZ #maintenance #student- Introducing students to professional software construction: a “software construction and maintenance” course and its maintenance corpus (GT, BM, AS, PZ), pp. 176–180.
LATA-2007-YoonSK #rule-based #word- Rule-based Word Spacing in Korean Based on Lexical Information Extracted from a Corpus (JY, GYS, SK), pp. 589–599.
DHM-2007-ZhengLODK #simulation- Human Motion Simulation and Action Corpus (GZ, WL, PO, LD, IK), pp. 314–322.
KDD-2007-BhagwatEM #clustering #documentation #scalability #similarity- Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus (DB, KE, PM), pp. 105–112.
SIGIR-2007-LiLHC #ad hoc #query #using #wiki- Improving weak ad-hoc queries using wikipedia as external corpus (YL, RWPL, EKSH, KFLC), pp. 797–798.
HT-2006-JensenM #multi #retrieval #web- Different indexing strategies for multilingual web retrieval: experiments with the EuroGOV corpus (NJ, TM), pp. 169–170.
CIKM-2006-BroderFJKMNPTX #query- Estimating corpus size via queries (AZB, MF, VJ, RK, RM, SUN, RP, AT, YX), pp. 594–603.
CIKM-2006-ShiN #adaptation #information retrieval #parallel- Filtering or adapting: two strategies to exploit noisy parallel corpora for cross-language information retrieval (LS, JYN), pp. 814–815.
ECIR-2006-ZhangWGV #automation #parallel #web- Automatic Acquisition of Chinese-English Parallel Corpus from the Web (YZ, KW, JG, PV), pp. 420–431.
ICPR-v1-2006-ZhangLSC #classification #performance- An Efficient SVM Classifier for Lopsided Corpora (XZ, BCL, WS, LC), pp. 1144–1147.
SIGIR-2006-BalogAR #enterprise #formal method #modelling- Formal models for expert finding in enterprise corpora (KB, LA, MdR), pp. 43–50.
SIGIR-2006-DiazM #estimation #modelling #scalability #using- Improving the estimation of relevance models using large external corpora (FD, DM), pp. 154–161.
ICDAR-2005-MihovSRDN #comparative #evaluation- A Corpus for Comparative Evaluation of OCR Software and Postcorrection Techniques (SM, KUS, CR, VD, VN), pp. 162–166.
DiGRA-2005-Goggin #question- Corpus Simsi: Or Can a Body Tell a Story? (JG).
KDD-2005-TaoZ #integration #mining- Mining comparable bilingual text corpora for cross-language information integration (TT, CZ), pp. 691–696.
SIGIR-2005-ZhouG #categorisation #geometry #on the- On redundancy of training corpus for text categorization: a perspective of geometry (SZ, JG), pp. 671–672.
JCDL-2004-BierGPN #documentation- A document corpus browser for in-depth reading (EAB, LG, KP, AN), pp. 87–96.
CIKM-2004-LitaC - Unsupervised question answering data acquisition from local corpora (LVL, JGC), pp. 607–614.
ECIR-2004-ChenTH #identification #novel #using- Identification of Relevant and Novel Sentences Using Reference Corpus (HHC, MFT, MHH), pp. 85–98.
SIGIR-2004-ChengTCWLC #information retrieval #query #web- Translating unknown queries with web corpora for cross-language information retrieval (PJC, JWT, RCC, JHW, WHL, LFC), pp. 146–153.
SIGIR-2004-ConradS #detection- Constructing a text corpus for inexact duplicate detection (JGC, CPS), pp. 582–583.
SIGIR-2004-KurlandL #ad hoc #information retrieval #modelling- Corpus structure, language models, and ad hoc information retrieval (OK, LL), pp. 194–201.
ICEIS-v2-2003-MengYLCCH - Act E-Service Question Answering Systems Based on Faq Corpus (IHM, WPY, HYL, YLC, BC, SLH), pp. 286–293.
SIGIR-2003-SadatYU #automation #information retrieval- Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora (FS, MY, SU), pp. 397–398.
SIGIR-2002-ClarkeCLLT #performance- The impact of corpus size on question answering performance (CLAC, GVC, ML, TRL, ELT), pp. 369–370.
JCDL-2001-Rydberg-CoxMC #documentation #quality- Document quality indicators and corpus editions (JARC, AM, GRC), pp. 435–436.
CIKM-2001-GhaniJM #mining #web- Mining the Web to Create Minority Language Corpora (RG, RJ, DM), pp. 279–286.
SIGIR-2001-FranzMWZ01a #parallel- Quantifying the Utility of Parallel Corpora (MF, JSM, TW, WJZ), pp. 398–399.
SIGIR-2001-GhaniJM #automation #generative #query #web- Automatic Web Search Query Generation to Create Minority Language Corpora (RG, RJ, DM), pp. 432–433.
- DL-2000-CraneR #editing
- New technology and new roles: the need for “corpus editors” (GRC, JARC), pp. 252–253.
ICML-2000-HosteDSG - Meta-Learning for Phonemic Annotation of Corpora (VH, WD, EFTKS, SG), pp. 375–382.
SIGIR-1999-Marcu #automation #research #scalability #summary- The Automatic Construction of Large-Scale Corpora for Summarization Research (DM), pp. 137–144.
ICML-1998-LittmanJK #independence #learning #representation- Learning a Language-Independent Representation for Terms from a Partially Aligned Corpus (MLL, FJ, GAK), pp. 314–322.
SIGIR-1998-Jean-David #automation #query- Automatic Acquisition of Terminological Relations from a Corpus for Query Expansion (SJD), pp. 371–372.
SIGIR-1998-RagasK #algorithm #classification- Four Text Classification Algorithms Compared on a Dutch Corpus (HR, CHAK), pp. 369–370.
CIKM-1997-GauchW #analysis #approach #automation #query- A Corpus Analysis Approach for Automatic Query Expansion (SG, JW), pp. 278–284.
SIGIR-1997-Jacquemin - Guessing Morphology from Terms and Corpora (CJ), pp. 156–165.
SIGIR-1997-SilversteinP #clustering #set- Almost-Constant-Time Clustering of Arbitrary Corpus Subsets (CS, JOP), pp. 60–66.
DL-1994-FutrelleZS #documentation #library #natural language- Corpus Linguistics for Establishing The Natural Language Content of Digital Library Documents (RPF, XZ, YS), pp. 165–180.
SIGIR-1992-Krovetz #information retrieval- Corpus Linguistics and Information Retrieval (RK), pp. 348–351.