A Preliminary Statistical Investigation into the impact of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author Classification
dc.contributor.author | Diab, Mona | en_US |
dc.contributor.author | Schuster, John | en_US |
dc.contributor.author | Bock, Peter | en_US |
dc.date.accessioned | 2004-05-31T23:04:43Z | |
dc.date.available | 2004-05-31T23:04:43Z | |
dc.date.created | 2000-06 Key words: N-gram, Shakespeare, Middleton, Wardigo, Funeral Elegy, Author Classification | en_US |
dc.date.issued | 2000-06-17 | en_US |
dc.description.abstract | Quantitative analysis of literary style has heretofore utilized semantic elements-word counts. This research attempts to identify quantifiable syntactic elements of style that can be used for author identification. The measurement of syntactic elements utilizes a dictionary with one part of speech per word and looks at phrases delimited by punctuation marks. Different size permutations of words - referred to as grams - are counted within each text. Correlations are measured amongst the gram frequencies of eight texts pertaining to four authors, both contemporary and non-contemporary. The correlations are performed across different gram sizes of words. The same treatment is applied to a target text, the Funeral Elegy text. The approach holds for classifying texts temporally consistently across the various gram sizes. Yet a finer grained investigation is required to certify the authorship of the Funeral Elegy text. (Also cross-referenced as UMIACS-TR-2000-39, LAMP-TR-046) | en_US |
dc.format.extent | 624697 bytes | |
dc.format.mimetype | application/postscript | |
dc.identifier.uri | http://hdl.handle.net/1903/1079 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | Digital Repository at the University of Maryland | en_US |
dc.relation.isAvailableAt | University of Maryland (College Park, Md.) | en_US |
dc.relation.isAvailableAt | Tech Reports in Computer Science and Engineering | en_US |
dc.relation.isAvailableAt | UMIACS Technical Reports | en_US |
dc.relation.ispartofseries | UM Computer Science Department; CS-TR-4148 | en_US |
dc.relation.ispartofseries | UMIACS; UMIACS-TR-2000-39 | en_US |
dc.relation.ispartofseries | LAMP-TR-046 | en_US |
dc.title | A Preliminary Statistical Investigation into the impact of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author Classification | en_US |
dc.type | Technical Report | en_US |