Mining the Web for Bilingual Text

dc.contributor.authorResnik, P.en_US
dc.date.accessioned2004-05-31T23:05:15Z
dc.date.available2004-05-31T23:05:15Z
dc.date.created2000-06en_US
dc.date.issued2000-06-15en_US
dc.description.abstractSTRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. The most recent end-product is an automatically acquired parallel corpus comprising 2491 English-French document pairs, approximately 1.5 million words per language. (Also cross-referenced as UMIACS-TR-2000-44) (Also cross-referenced as LAMP-TR-051)en_US
dc.format.extent637884 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/1084
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4153en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2000-44en_US
dc.relation.ispartofseriesLAMP-TR-051en_US
dc.titleMining the Web for Bilingual Texten_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
CS-TR-4153.ps
Size:
622.93 KB
Format:
Postscript Files
Loading...
Thumbnail Image
Name:
CS-TR-4153.pdf
Size:
204.22 KB
Format:
Adobe Portable Document Format
Description:
Auto-generated copy of CS-TR-4153.ps