FileTokeniser ------------- This distribution contains the FileTokeniser, a component that tokenises a file. It is a combination of the LineReader, Sentencer, and Tokeniser components. Usage: construct a FileTokeniser with the name of the file you want to have tokenised, register any WordEvent listeners using the addWordListener() method, and kick off the whole process by calling the FileTokeniser's start() method. The registered listener(s) will receive each word as it is read. You can extract the data from the event by calling its getValue() method with the parameter "token"; this will return null when the end of the input file has been reached. Markup (such as sentence boundary markers) will also be returned; these are normally enclosed in pointed brackets (eg for sentence boundaries). This distribution contains all the required classes, including those which tie together the separate components. You can also run the FileTokeniser as a stand-alone component. Here you will need to supply the filenames of the files to be tokenised as command-line parameters, and each token will be printed on a separate line to System.out. (C) 2000 Phrasys (www.phrasys.com / www.phrasys.co.uk)