A Preliminary Statistical Investigation into the impact of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author Classification

dc.contributor.authorDiab, Monaen_US
dc.contributor.authorSchuster, Johnen_US
dc.contributor.authorBock, Peteren_US
dc.date.accessioned2004-05-31T23:04:43Z
dc.date.available2004-05-31T23:04:43Z
dc.date.created2000-06 Key words: N-gram, Shakespeare, Middleton, Wardigo, Funeral Elegy, Author Classificationen_US
dc.date.issued2000-06-17en_US
dc.description.abstractQuantitative analysis of literary style has heretofore utilized semantic elements-word counts. This research attempts to identify quantifiable syntactic elements of style that can be used for author identification. The measurement of syntactic elements utilizes a dictionary with one part of speech per word and looks at phrases delimited by punctuation marks. Different size permutations of words - referred to as grams - are counted within each text. Correlations are measured amongst the gram frequencies of eight texts pertaining to four authors, both contemporary and non-contemporary. The correlations are performed across different gram sizes of words. The same treatment is applied to a target text, the Funeral Elegy text. The approach holds for classifying texts temporally consistently across the various gram sizes. Yet a finer grained investigation is required to certify the authorship of the Funeral Elegy text. (Also cross-referenced as UMIACS-TR-2000-39, LAMP-TR-046)en_US
dc.format.extent624697 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/1079
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4148en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2000-39en_US
dc.relation.ispartofseriesLAMP-TR-046en_US
dc.titleA Preliminary Statistical Investigation into the impact of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author Classificationen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
CS-TR-4148.ps
Size:
610.06 KB
Format:
Postscript Files
Loading...
Thumbnail Image
Name:
CS-TR-4148.pdf
Size:
83.52 KB
Format:
Adobe Portable Document Format
Description:
Auto-generated copy of CS-TR-4148.ps