Files Generated by Word Occurrence Experiments

Our Experiment Seventeen runs custom software written in the Python computer language, whose product is a set of files of assorted file types resulting from analyzing the text of The First Folio of William Shakespeare (1623). All are available for free download through the graphical user interface provided below.

Here are two pages introducing the Word Cipher available on this site:

Optimally Concise Summary of the Word CipherShakespearean Nightmare Words

The first table following here uses the file naming convention, {‘Work name’ + ‘extension’}, for example, ‘Hamlet.csv’.

The four Word Cipher ‘Guide Words’ (so frequently mentioned) are: Fortune, Nature, Honour and Reputation.

The generating source code is available through links in Experiment Seventeen for both the Jupyter Notebook used by Colab, and the Python source code stored at Github.

Of particular interest is discovering and cataloging the vividly anomalous Word Clusters which have always been known to exist within the Works of William Shakespeare, but which to this day, remain an enigma to Scholars, even though the First Folio is widely regarded to be the most studied, analyzed and picked-over written document in the history of the English Language.

Files produced by a run of the custom app are deposited into a folder visibly named with the day’s date as in the form like, ’03-15-2024′.

File PurposeExtension
The original source text used as input to the Python programs in Experiments, supplied by Project Gutenberg, unmodified except with the XX-long copyright notice excised (I wonder what it said…).txt
PDF of the above. The Page Count for a PDF document is system-dependent but consistent across documents. .pdf
The table of word occurrences extracted from the Work, after applying filtering from Stop Words..csv
A table of word occurrences for the four Guide Words, extracted from the Works, after applying inclusion from partial words, so that (for example) both ‘honour’ and ‘honor’ are included; also both ‘reputation’ and ‘reputable’ are included.…guide words.csv
Original text made all lowercase, with punctuation stripped, and with Guide Words color-coded. To Do: save pages as PDF, then subsample down to a PNG with size suitable for Machine Vision, then use existing Clustering ‘maths’ as is common in Medical Imaging, to determine maximum guide word clustering, fully automated, and untouched by Human Hands.…page.html
Original sentences made all lowercase, with Guide Words color-coded within the sentences. Totals figures presented are page-oriented, not sentence-oriented.…sentences.html
Word cloud generated from the Word Occurrences .csv file..png

The following presents aggregated totals for all 37 Works:

Master Table of Guide Word Occurrences

