-*-Text-*- This is the file ispell.info, which documents the ISPELL spelling checker/corrector.  File: ISPELL Node: Top Up: (DIR) Next: Correcting 1. INTRODUCTION This memo documents a spelling check/correction program maintained on the four MIT ITS computers and on the MIT-XX Twenex computer. The program is called "SPELL" on ITS and "ISPELL" (for ITS-style SPELL) on Twenex. (The name ISPELL is used on Twenex to avoid conflict with the standard DEC TOPS-20 program called SPELL. This memo does NOT apply to the DEC TOPS-20 program.) The program's behavior is nearly identical on ITS and Twenex. (The principal difference is that arguments to a command are separated by commas on ITS and spaces on Twenex.) A command consists of a command name and its arguments on one line, for example SET SCRIBE LOAD MYDICT,3 CORRECT FOO BAR,FOO OUT CORRECT FOO BAR,FOO OUT/13 ASK PROCEDE SPELL is designed to process input text for text formatters TJ6, R, SCRIBE, PUB, and TEX. It has the necessary knowledge about the command structure of these programs to avoid attempting to check the spelling of formatter commands. It can be told by means of special comments to suppress checking of designated portions of the text. SPELL's querying mechanism for correcting misspelled words works best when it is used from a display terminal, but SPELL may be effectively used from a printing terminal or slow display, especially if the "context display" and "alternate spellings display" (see below) are disabled. It may be used effectively from uppercase-only terminals, since it uses the capitalization from the original word in the file when a spelling correction is typed in. SPELL has a dictionary of about 38000 words and requires about 43K of memory to run. Report bugs and complaints to BUG-SPELL@MIT-ML. * Menu: * Correcting:: The principal operation * Training:: Making a list of all unknown words from a file * Asking:: Asking about specific words, and other fun and games * Emacs:(EMACS)Fixit Using SPELL from EMACS to check the spelling of one word * Load/Dump:: Loading and dumping dictionaries * Options:: Switches that control formatter mode etc. * Syntax:: Syntax of commands and file names, and operation from "JCL" * Esoterica::  File: ISPELL Node: Correcting Previous: Top Up: Top Next: Training 2. The CORRECT command The principal operation which SPELL performs is that of "correcting" a file. This consists of reading the file and optionally writing an output file in which misspellings are corrected. The corrections are made either by substituting a word known to SPELL or by retyping the word manually. Whether output is being written or not, every misspelling (that is, every word not known to SPELL) is displayed, and the user is queried about the action to be taken. The command name "CORRECT" is followed by the filespec of the input file, the optional filespec of the output file, and an optional switch specifying the starting line number. If this line number is given, all earlier lines in the input file are copied to the output file without being checked. If no output is desired, omit the second file specification. CORRECT INPUT FILE,OUTPUT FILE CORRECT INPUT FILE,OUTPUT FILE/13 CORRECT INPUT FILE (produce no corrected output) 2.1 Suppressing checking of designated portions of the text When SPELL is being operated in a formatter mode, it can suppress checking of regions containing figures drawn with strange fonts or other nonsensical-looking text. It does this by looking for comments looking like (for the R formatter) "^K &&&SPELLON" or "^K &&&SPELLOFF". The comments must be exactly as shown: a control K, space, three ampersands, and the word entirely in capitals. What follows later on the line is immaterial. The SPELLOFF command disables checking until the next SPELLON command. The unchecked text is still copied into the output file. The commands are: R "^K &&&SPELLON" and "^K &&&SPELLOFF" TEX "% &&&SPELLON" and "% &&&SPELLOFF" TJ6 ".C &&&SPELLON" and ".C &&&SPELLOFF" PUB ".<< &&&SPELLON" and ".<< &&&SPELLOFF" SCRIBE "@comment(&&&SPELLON)" and "@comment(&&&SPELLOFF)" Remember that SPELL does not expand formatter macros. While it would be convenient to put SPELLOFF and SPELLON commands in macros, it would not have the desired effect. SPELL responds to the commands only when they literally appear in the text. 2.2 Querying during a correction When SPELL finds an unknown word during a correction, it displays: (1) The unknown word. (2) If the "LIST" option is on (which it normally is), a list of words that are known to SPELL and are close to the unknown word. One of these might be the word that was intended. These words are displayed with index numbers beginning with zero. (3) If the "DISPLAY" option is on (which it normally is), the line number and a few lines of the file showing the context in which the word appeared. The user then types one of the commands listed below. These commands are acted on IMMEDIATELY. No is typed after them. The intention is to allow rapid interaction with the program. However, care is required to avoid typing the wrong thing. A or Accept the word, but do not put it in the dictionary. It will still be unknown, so SPELL will query again if it encounters this word again. I Accept the word and put it into dictionary number 1. The word will henceforth be known, so SPELL will not query about it again. If the dictionary is written out at the end of the run, it can be read back in on future runs and SPELL will know all such words. D1 to D9 Accept the word and put it into dictionary number N. 0 to 9 Substitute the numbered choice for the unknown word. (This is only useful if corrected output is being written.) The corrected word will be given the same capitalization as the unknown word, if possible. If not possible, SPELL will say so. R The user types in a new word to replace the unknown one. (This is only useful if corrected output is being written.) As above, the capitalization of the original word will be used. The capitalization of the typed-in word is ignored. (Hence SPELL can be operated effectively from upper-case-only terminals.) +/- Sets or clears the indicated option, just as for the SET or CLEAR command at the top level. The letter typed after the "+" or "-" is a single letter abbreviation for the option name. It is the first letter of the option name, except for TJ6, whose abbreviation is "J". After giving this command, the user must still dispose of the unknown word. W The unknown word, and all future words, are accepted. That is, the rest of the input file is immediately copied to the output file without further checking. This is sometimes useful if the system is going down in one minute. It also complements the optional starting line feature of the CORRECT command. ^L The display is refreshed. ^G The entire correction operation is aborted, and SPELL goes back to the top level to await the next operation command. ? A brief summary of the available commands is displayed, showing the currently selected options.  File: ISPELL Node: Training Previous: Correcting Up: Top Next: Asking 3. The TRAIN command This is a sort of "off-line" version of correcting. Instead of querying the user, SPELL lists all unknown words in an "exceptions" file, and puts them into dictionary number 1 also. It is often possible to determine quickly whether a file contains errors by training and then examining the exceptions file. The word "TRAIN" is followed by the filespecs of the input file and the exceptions file. SPELL can suppress its training of designated portions of the text in exactly the same way as during corrections, by using the "&&&SPELLON" and "&&&SPELLOFF" comments.  File: ISPELL Node: Asking Previous: Training Up: Top Next: Load/Dump 4. The ASK command This is used for manual interrogation of the dictionary about a single word. The command is "ASK" followed by the word. SPELL will indicate whether it knows the word. If the word is known, SPELL will also indicate whether suffix removal was used, and whether the word has suffix flags. (Users generally do not need to be concerned with this information.) If the word is not known, SPELL will list similar words that it knows, if any. 5. The JUMBLE command This command is provided to solve "scrambled word" problems of the sort that appear in some newspapers. The command "JUMBLE" is followed by the string (not more than 8 letters). All permutations of the letters in the string that are known to SPELL are displayed. Because the program is not clever about redundant permutations, it will display words more than once if they contain duplicated letters.  File: ISPELL Node: Load/Dump Previous: Asking Up: Top Next: Options 6. The LOAD and DUMP commands The LOAD command reads a file and places the words in it into one of 9 dictionaries that SPELL has in addition to its main dictionary. The word LOAD is followed by the dictionary filespec and an optional dictionary number (1 through 9). If the number is not given, dictionary number 1 will be loaded. (One incremental dictionary is usually sufficient.) The "DUMP" operation has the same syntax and writes the indicated dictionary onto a file. A word may be in only one dictionary. If it is in any dictionary it is "known" for purposes of correcting, training, etc. When loading a dictionary, SPELL ignores any word that it already knows, even if it is in a different dictionary than the one specified. Incremental dictionaries are useful for maintaining private lists of words that appear in one's files and are known to be correct but are not in SPELL's main dictionary. When correcting a file, the user should type "I" when such a word is encountered, and then dump the incremental dictionary at the end of the run. This dictionary may than be read back in at the start of the next run. Dictionaries other than number 1 are occasionally useful for maintaining different lists of jargon words or words specific to particular subjects.  File: ISPELL Node: Options Previous: Load/Dump Up: Top Next: Syntax 7. The SET and CLEAR commands SPELL has 7 "option" flags to control its operation: "R" option Reading "R" text: ignore commands, font indicators, comments, string and number register names, and inline macro names. It correctly handles "hexadecimal" fonts as in ^FAword^F*. Warning: Since command lines are ignored, titles appearing in chapter, section, and figure macros are not checked and should be inspected manually. The same is true of string registers that are to expand to actual text. Also, SPELL is not clever about things that are quoted or translated, or other obscure features of text formatters. For example, it thinks "^Q^K" starts a comment, because it doesn't know about the action of ^Q. This is true, to varying degrees, for all of the formatter options. "TJ6" option Reading TJ6 text: ignore spelling of command lines and font indicators. See the warning under the "R" option. "SCRIBE" option Reading SCRIBE text: ignore all words preceded by "@", and the arguments to those commands whose arguments are not expected to be text. Only comments of the form @comment(...) are recognized and ignored. Comments of the form @begin(comment)...@end(comment) are not. The latter types of comments may be protected by additional "&&&SPELLOFF" and "&&&SPELLON" comments, if desired. See the warning under the "R" option. "PUB" option Reading PUB text: like TJ6. "TEX" option Reading TEX text: ignore commands, font indicators, and comments. See the warning under the "R" option. "LIST" option List suggested alternate spellings when it encounters an unknown word. If you don't care about the alternate spellings, you can make it run faster by turning this off. "DISPLAY" option Display the context and line number of the unknown word. It saves wear and tear on printing terminals to turn this off if you don't really need it. Options are turned on by the command SET (option name), off by CLEAR (option name). At present, only the first letter of the word following SET or CLEAR is looked at, and "J" must be used to denote TJ6, so that "T" can denote TEX. The HELP command displays the currently enabled options. Only one of the formatter options ("TJ6", "R", "PUB", "SCRIBE", and "TEX") may be on at one time. Turning any of them on clears the others. The initial options are "R", "LIST", and "DISPLAY". In addition to the top level commands SET and CLEAR, options may be set or cleared when SPELL is querying during a correction. To do this, type a plus sign to set, or a minus sign to clear, followed by a SINGLE LETTER abbreviation for the option (or "J" for TJ6). For example, if you had DISPLAY off and you decide you want to see the context of a particular word after all, you can turn it on by typing "+D". 8. The QUIT and KILL commands "KILL" exits from SPELL and kills the job. "QUIT" exits but allows it to be restarted.  File: ISPELL Node: Syntax Previous: Options Up: Top Next: Esoterica 9. ENTERING AND EDITING COMMANDS A command consists of a command name ("CORRECT" etc.) followed by whatever arguments (e.g. filespecs) are appropriate, terminated by a carriage return. The command name may be abbreviated to any unambiguous prefix. In the command line, rubout deletes the last character. ^U deletes the entire line, allowing one to start over. ^R redisplays the material typed so far, in case the screen got messed up. A question mark gives a help message. ^Q quotes the following character in a filename. Filespecs and other arguments are separated from each other by commas, for example CORR FOO BAR,FOO OUT LOAD MYDICT,3 To specify the starting line number for a correction operation, follow the last filespec by a slash and the number, e.g. CORR FOO BAR,FOO OUT/13 In the JCL line, an altmode is equivalent to a carriage return in terminating the command, making it possible to put multiple commands into a JCL string meaningfully. In filenames, the Sname initially defaults to the user's working directory, and is "sticky", i. e. will subsequently default to the last specification used. The device and second filename always default to "DSK:" and ">", and are not sticky. The first filename must always be typed: it has no default value. 10. STARTING SPELL WITH A COMMAND LINE If a command line ("JCL") is supplied when SPELL is started, it will use that line, for as long as it lasts, instead of taking commands from the terminal. Any command error cancels the rest of the command line and reverts to interactive operation. Since can't be put into the command line, SPELL allows altmode to separate commands. This is allowed only when the command is being read from JCL. example: :SPELL LOAD MYDICT$CORRECT FOO,BAR loads dictionary MYDICT > and corrects FOO >, writing corrected output to BAR >. :SPELL LOAD MYDICT$TRAIN FOO,FOO ERRS$KILL Loads MYDICT > and trains FOO >, putting exceptions into FOO ERRS, and then kills itself.  File: ISPELL Node: Esoterica Previous: Syntax Up: Top 11. ESOTERICA The following information should not be necessary for normal operation of SPELL. It is included for completeness. 11.1 File operations SPELL reads and writes files in "block image" mode, using blocks of 128 words. On writing, the last word is padded with ^C. On reading, any padding of ^@, ^A, ^B, or ^C at the end of the last word is removed. 11.2 Word identification algorithm A word is any uninterrupted sequence of letters and apostrophes, which does not begin or end with an apostrophe. Any punctuation, digit, or control character separates words. Words are not checked if they are being ignored under control of the text formatter mode. Any word consisting of a single letter, or any word more than 40 letters long, is considered to be correctly spelled. 11.3 Closeness algorithm Two words are considered to be "close" if they differ in only one of the following ways: Two adjacent letters are interchanged. One letter is different. One letter is missing in one and present in the other. This criterion is used in suggesting close words when an unknown word is found while correcting a file or when an unknown word is asked for with the "A" command. Hence the word SEQUENCE will be suggested if the program encounters SEUQENCE, SERQUENCE, SEQUNCE, or SEQUENCW. 11.4 Dictionary policy It is the policy of this program to contain only one spelling of a word, even if ordinary dictionaries show two or more "acceptable" spellings. Hence, the dictionary contains LABELED and LABELING, but not LABELLED or LABELLING, even though all four are actually acceptable. The intention is to enforce uniformity within each document. The author apologizes for the restriction on creativity and diversity that this necessitates, but believes that it is the best policy for this program. The dictionary contains many technical and computer terms such as MICROPROGRAM and DEBUGGER, but does not contain extreme jargon words such as CONTROLIFY or VALRET. The dictionary contains no proper names other than names of countries and states of the United States. The reason is that it would be virtually impossible to contain all of the proper names that commonly arise in normal use. Users should keep proper names (and other correctly spelled words) that arise in their own work in private dictionaries to avoid having to repeatedly tell SPELL to accept them. The dictionary is significantly smaller than that found in other spelling checkers, such as the DEC TOPS-20 program. The author believes that the larger dictionary would not reduce the number of false misspelling indications by very much. Users who find words that SPELL does not know, but should, are urged to mail pointers to lists of such words (or the documents in which they appear) to BUG-SPELL@MIT-ML. They will be considered for addition to the dictionary. Users who wish to argue that an extremely large dictionary would be better should mail pointers to specific documents demonstrating this. The argument might be valid, but evidence is needed. 11.5 The "BASK" command The "BASK" command is a variant of the "ASK" command intended for invocations of SPELL by other programs. The command line (presumably from a JCL line) is "BASK" followed by the word to be looked up, and then the name of the file to which the information is to be written, for example: :SPELL BASK WHEREVER,TEST OUT$KILL looks up the word "wherever", and writes a report in the file "TEST OUT". The report is similar to the information displayed after the "A" command, except that formatting characters are absent and the result of the search is indicated by the first character of the file. If the word was found directly, the file consists of a "*" followed by any dictionary flags that the word may have had. If the word was found through suffix removal, the file consists of a "+", the root word, and the dictionary flags of the root word. If the word was not found and there are no words close to it, the file consists of a "#". If the word was not found but there are close words, the file consists of a "&" immediately followed by a list of close words, separated from each other by newlines. The file is always terminated by a . 11.6 Dictionary flags Words in SPELL's main dictionary (but not the other dictionaries) may have flags associated with them to indicate the legality of suffixes without the need to keep the full suffixed words in the dictionary. The flags have "names" consisting of single letters. Their meaning is as follows: Let # and @ be "variables" that can stand for any letter. Upper case letters are constants. "..." stands for any string of zero or more letters, but note that no word may exist in the dictionary which is not at least 2 letters long, so, for example, FLY may not be produced by placing the "Y" flag on "F". Also, no flag is effective unless the word that it creates is at least 4 letters long, so, for example, WED may not be produced by placing the "D" flag on "WE". "V" flag: ...E --> ...IVE as in CREATE --> CREATIVE if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE "N" flag: ...E --> ...ION as in CREATE --> CREATION ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN "X" flag: ...E --> ...IONS as in CREATE --> CREATIONS ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS "H" flag: ...Y --> ...IETH as in TWENTY --> TWENTIETH if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH "Y" FLAG: ... --> ...LY as in QUICK --> QUICKLY "G" FLAG: ...E --> ...ING as in FILE --> FILING if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING "J" FLAG" ...E --> ...INGS as in FILE --> FILINGS if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS "D" FLAG: ...E --> ...ED as in CREATE --> CREATED if @ .ne. A, E, I, O, or U, ...@Y --> ...@IED as in IMPLY --> IMPLIED if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) ...@# --> ...@#ED as in CROSS --> CROSSED or CONVEY --> CONVEYED "T" FLAG: ...E --> ...EST as in LATE --> LATEST if @ .ne. A, E, I, O, or U, ...@Y --> ...@IEST as in DIRTY --> DIRTIEST if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) ...@# --> ...@#EST as in SMALL --> SMALLEST or GRAY --> GRAYEST "R" FLAG: ...E --> ...ER as in SKATE --> SKATER if @ .ne. A, E, I, O, or U, ...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) ...@# --> ...@#ER as in BUILD --> BUILDER or CONVEY --> CONVEYER "Z FLAG: ...E --> ...ERS as in SKATE --> SKATERS if @ .ne. A, E, I, O, or U, ...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) ...@# --> ...@#ERS as in BUILD --> BUILDERS or SLAY --> SLAYERS "S" FLAG: if @ .ne. A, E, I, O, or U, ...@Y --> ...@IES as in IMPLY --> IMPLIES if # .eq. S, X, Z, or H, ...# --> ...#ES as in FIX --> FIXES if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U) ...# --> ...#S as in BAT --> BATS or CONVEY --> CONVEYS "P" FLAG: if @ .ne. A, E, I, O, or U, ...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS if # .ne. Y, or @ = A, E, I, O, or U, ...@# --> ...@#NESS as in LATE --> LATENESS or GRAY --> GRAYNESS "M" FLAG: ... --> ...'S as in DOG --> DOG'S Note: The existence of a flag on a root word in the directory is not by itself sufficient to cause SPELL to recognize the indicated word ending. If there is more than one root for which a flag will indicate a given word, only one of the roots is the correct one for which the flag is effective; generally it is the longest root. For example, the "D" rule implies that either PASS or PASSE, with a "D" flag, will yield PASSED. The flag must be on PASSE; it will be ineffective on PASS. This is because, when SPELL encounters the word PASSED and fails to find it in its dictionary, it strips off the "D" and looks up PASSE. Upon finding PASSE, it then accepts PASSED if and only if PASSE has the "D" flag. Only if the word PASSE is not in the main dictionary at all does the program strip off the "E" and search for PASS. Furthermore, some combinations of flags are forbidden to allow for dense flag encoding to save space. For example, only one of the "P", "J", or "V" flags may be on in any one word. Therefore, be careful when installing dictionary flags. When in doubt about how to install a flag, don't; let the program do it for you. When a word is read into the main dictionary, whenever possible SPELL sets the appropriate flag in an existing word instead of entering the word itself. So, for example, reading PASSED into the dictionary would result in the "D" flag being set correctly. Warning: The above definitions are part of the internal behavior of SPELL, and are subject to change without notice. Users copying the dictionary should always get the then-current version of this document to get an accurate description of the flags. 11.7 Preparing new versions of SPELL The main dictionary is dictionary 0, and may be accessed by that number. When it is dumped, its flags will be written also. A file containing flags must not be read into a dictionary other than zero, nor should it be read in if dictionaries other than zero contain any words. (Otherwise, it may attempt unsuccessfully to set a flag on a word whose dictionary number is nonzero.) Therefore, flags should be used only on the dictionary that is loaded when a new version of SPELL is created. To create a new version of SPELL, assemble it under MIDAS and start it. Give the "LOAD" command and read in the master dictionary, specifying dictionary zero. Then give the "WRITE" command, with the name of the desired image file. On ITS, this should be a file such as "TS SPELL". On Twenex, it should be a file such as "ISPELL.EXE". On ITS, it will ask whether to purify it. Say "N", for the purification stuff is still not repaired from an ancient version. On Twenex it will always be written in sharable form.