Mining the Web for Bilingual Text
dc.contributor.author | Resnik, P. | en_US |
dc.date.accessioned | 2004-05-31T23:05:15Z | |
dc.date.available | 2004-05-31T23:05:15Z | |
dc.date.created | 2000-06 | en_US |
dc.date.issued | 2000-06-15 | en_US |
dc.description.abstract | STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. The most recent end-product is an automatically acquired parallel corpus comprising 2491 English-French document pairs, approximately 1.5 million words per language. (Also cross-referenced as UMIACS-TR-2000-44) (Also cross-referenced as LAMP-TR-051) | en_US |
dc.format.extent | 637884 bytes | |
dc.format.mimetype | application/postscript | |
dc.identifier.uri | http://hdl.handle.net/1903/1084 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | Digital Repository at the University of Maryland | en_US |
dc.relation.isAvailableAt | University of Maryland (College Park, Md.) | en_US |
dc.relation.isAvailableAt | Tech Reports in Computer Science and Engineering | en_US |
dc.relation.isAvailableAt | UMIACS Technical Reports | en_US |
dc.relation.ispartofseries | UM Computer Science Department; CS-TR-4153 | en_US |
dc.relation.ispartofseries | UMIACS; UMIACS-TR-2000-44 | en_US |
dc.relation.ispartofseries | LAMP-TR-051 | en_US |
dc.title | Mining the Web for Bilingual Text | en_US |
dc.type | Technical Report | en_US |