Resting Anchor

The Anchorage

Personal website of Gregory K. Maxey, Commander USN (Retired)

Word Usage & Frequency Report
(A Microsoft Word Help & Tip page by Gregory K. Maxey)

DISCLAIMER/TERMS OF USE

The information, illustrations and code contained in my "Microsoft Word Tips" are provided free and without risk or obligation.

Click to acces PayPal Verification Service Click to acces PayPal Verification Service

However, the work is mine. If you use it for commercial purposes or benefit from my efforts through income earned or time saved then a donation, however small, will help to ensure the continued availability of this resource.

If you would like to donate, please use the appropriate donate button to access PayPal. Thank you!

Click to donate British Pound Sterling                   Click to donate US dollars                   Click to donate EU euros

This Microsoft Word Tips & Microsoft Word Help page provides "Word Usage & Frequency"  Word template add-in to facilitate calculating and reporting word usage and frequency in a document. The add-in provides user interface via a userform for all processing and output options. The add-in can report:

Site Note IconNote:  The add-in (by design) does not count words contained in headers, footers, and the text frame of shapes anchored in headers or footers.

This add-in has been significantly debugged, redesigned and upgrade with the release of version 2.0.  The upgrades include:

The add-in user interface (UI) with first use default options is shown if the following illustrations:

word_freq_1

word_freq_2

The options and setting in this add-in are "sticky."  This means that any changes you make to the default options are saved as the "new" default options and settings when you use the add-in again.

Using the "Abbreviations" tab on the multipage control, you can review, delete/modify the default abbreviations defined in the add-in or add your own custom defined abbreviations.  With this feature, defined abbreviations appear "with" their terminal period in the output report.  You can reset the defined abbreviations to the original add-in default abbreviations or your last saved abbreviations using the "Reset List(s)" command button controls.

The three output options are illustrated below:

word_freq_3

word_freq_4
Appended to active document

word_freq_5
New document

word_freq_6

Index AutoMark document

Site Note IconNote:  The Index AutoMark document can be edited then saved and used to automatically index words when you need to create an index in Word.  See:  How to Create an Index Table Like a Pro with Microsoft Word.

Developer's Notes, Tips and General Statement of Fallibility

The original form of the add-in was one of the first complete projects I published in this website.  Word MVP Doug Robbins had shared some code with me for counting words in a document and I simply added a userform to provide a user interface.

Over the years and with feedback from users, I realized that Doug's work and mine came up a bit short in complex documents containing anything more that just simple text.

I have spent a lot of time trying to fix individual issues for users, only to discover that fixing one thing would often break two or three other things. hairpull_gif

Counting Words

In working with version 2.0, I started toying with the idea of trying to mimic how Word's built-in "Word Count" feature worked into my process.  How does "Word Count" work?  I don't know for sure as the process is not exposed to VBA novices.  Generally, however, "Word Count" counts any text separated by any "white space" as an individual word.

White space for the purpose of counting is anything in the text that is intended to separate words.  These include spaces, tabs, paragraphs, sections, and yes (while not actually white) En and Em dashes.

That is a relatively simple concept.  For example the text (including the quotation marks): "Jack and Jill went up the hill." consists of seven strings of text separated by six "white spaces."  For this text, Word Count returns "Words:  7"

The challenge to the VBA novice are the terms "Words" and "Count" as they relate to the Word object mode.  To illustrate, open a new blank document, then add and run following procedure in the document project module.

VBA Script:
Sub DemoWordCount()
'Code to demo how the Word object model "Words" and "Words.Count" properties _
'are radically different than Words "Word Count" feature.
Dim lngCount As Long, lngIndex As Long
  ActiveDocument.Range.Text = ""
  ActiveDocument.Range.Text = """Jack and Jill went up the hill."""
  lngCount = ActiveDocument.Words.Count
  MsgBox "Words: " & lngCount
  For lngIndex = 1 To lngCount
  If Asc(ActiveDocument.Words(lngIndex)) = 13 Then
    MsgBox "Word " & lngIndex & " that you can't even see," _
         & "is the end of document paragraph mark."
  Else
    MsgBox ActiveDocument.Words(lngIndex)
  End If
  Next lngIndex
lbl_Exit:
  Exit Sub
End Sub

You should have noticed the resulting document text constitutes the same seven words, but the VBA count and concept of a "word" is radically different!  These differences are only compounded by a order of magnitude in more complex documents.

Word's "Word Count" feature has the easy task of counting words and presenting simply a number.  As the developer of the add-in, my task is to define and list what those counted words are.  To do so, I had to make some choices regarding what is and what is not counted and displayed as words or parts of words.

word_freq_7

Two interesting and tricky problems are the presence of periods "." and grouping/quotation symbols (e.g.,  ""(){}[]) in the text stream.

Periods/Abbreviations

Periods typically terminate sentences.  In the simple example with Jack and Jill, the word "hill" should and is indexed and listed by the add-in as "hill" without the period.  However, periods also terminate abbreviations.  If our text is changed to "M. Jack and Mlle. Jill went up the hill." then the abbreviations "M." and "Mlle." should be and are indexed and listed by the add-in with the period.

These distinctions are possible in the add-in though the use of defined abbreviations and initials.  By defining "M." and "Mlle." is this manner, their terminating period is considered a part of the preceding word.

I've include several common abbreviations as defaults in the add-in.  As the user, you may add your own custom abbreviations.  Keep in mind that the add-in can only try to deduce the proper way to display an encountered word.  It can't think!! For example, if you decide to define "pub." as an abbreviation then the add-in will be quite content to index and display "pub." as in "IRS pub. 345" as "pub." with the period.  However, if the same document contains "Joe paid his taxes and went to the pub.", the add-in will and is likewise content to count a second instance of "pub."  If some smart guy or gal has a suggestion to improve this process, I'm ready to listen!

word_freq_8

Grouping Symbols

Unless they stand alone and are separated from other text by white space, grouping symbols are not indexed or displayed.  In "One (1) ping only, please.," the "1" is indexed and displayed without the opening/closing parenthesis marks.  There is one special case exception.  In cases where a suffix is set off with opening and closing parenthesis e.g., "Provide lender(s) all case documents.", the word "lender(s)" is indexed and displayed with the parenthesis.

word_freq_9

Hidden Text and Fields

Generally, and for better accuracy and matching with "Word Count," you should ensure that "Show all formatting marks" and particularly "hidden text" if off.  Additionally, you should ensure that fields are displaying their result and not their code.

If left displayed, the add-in will index and display any hidden or field code text in the document as illustrated below.  Some users may see benefit in this behavior, so I have left it up to the user whether to display or not display this type of text.

word_freq_10

Document Changes

For all practical purposes, the add-in leaves the text of of your original document unchanged.  The two exceptions are when you choose the option to append results to the original document and when documents contain extraneous white space between the last instance of text and the end of the document as illustrated below.

word_freq_11

As part of processing, the add-in will remove any extraneous white space found at the end of the processing range.

word_freq_12

Site Note IconNote:  If the changes described above are unsuitable for your needs, then please make a copy of your original document.  You can then process the copy with changes being made to your original document.

Speed

There is a lot of things going on behind the scenes when using the add-in.  Processing long documents can take considerable time.  Processing very long documents can take a very long time.  For example, processing a 50 page legal contract with my relatively efficient, high speed PC takes about 20 seconds.

So you don't lose hope and think everything has stalled or failed, I've included a progress report that updates during processing.

word_freq_13
Processing words.

word_freq_14
Sorting processed words.

word_freq_15

Imperfect Process

I have spent dozens and dozens of hours if not a over a hundred, surrounded by bits of bloody scalp and hair in an attempt to make the add-in as robust and functional as possible.

I am rather pleased with the result, but I know that it is not perfect.  Some user will undoubtedly create and attempt to process text that will highlight a fault or unhandled condition.  My experience while doing the development work was that when if failed, it failed miserably.  I suggest you compare the result with Word's "Word Count" and if the two counts are identical then you can feel reasonable confident with the result.  Keep in mind that Word Count always counts list paragraph text (which I think is stupid) and the add-in doesn't unless you specify.

If you find a fault or have suggestions for improvement, please let me know using the feedback link on this website.

Now the Add-In:

Version 2.4 was created using Word 2010.  It is wholly functional with Word 2007/2010/2013.  You can initiate a word usage report by clicking the "Process/Report" control in the "Word Usage" group of the ribbon Add-Ins tab.

A .dot format of version 2.4. is included in the download package for Word 2003 users.  With this version, user can initiate a word usage report by clicking the "Process Report" control in the custom menu "Word Usage." 

Download the templates here: Word Usage and Frequency.zip.

Site Note icon For more on template add-ins and how to load them, see Organizing Your Macros/Template Add-ins at: Installing Macros

That's it! I hope you have found this Microsoft Word Help & Tips page useful and informative.

DISCLAIMER/TERMS OF USE

The information, illustrations and code contained in my "Microsoft Word Tips" are provided free and without risk or obligation.

Click to acces PayPal Verification Service Click to acces PayPal Verification Service

However, the work is mine. If you use it for commercial purposes or benefit from my efforts through income earned or time saved then a donation, however small, will help to ensure the continued availability of this resource.

If you would like to donate, please use the appropriate donate button to access PayPal. Thank you!

Click to donate British Pound Sterling                   Click to donate US dollars                   Click to donate EU euros

Search my site or the web using Google Search Engine

Google Search Logo