Class com.phrasys.Sentencer

java.lang.Object
    |
    +----com.phrasys.Sentencer

public class Sentencer
extends java.lang.Object
implements java.io.Serializable, LineListener
This is a sentence splitter for English language texts. It reads LineEvents and tries to identify sentence and paragraph boundaries in the data. As these are found, SentenceEvents and ParagraphEvents are sent to registered listeners. The recommended sequence is for the Sentencer to be followed by the Tokeniser. For input the LineReader or any other emitter of LineEvents can be used.

Version:
1.0
Author:
Oliver Mason

Variable Index

 o DIRECTQUOTES
directionalised quotes (``text'')
 o MIXEDQUOTES
mixed quotes (`text")
 o PLAINQUOTES
plain quotes ("text")

Constructor Index

 o Sentencer()
Constructor.

Method Index

 o addParagraphListener(ParagraphListener)
Add a paragraph listener.
 o addSentenceListener(SentenceListener)
Add a sentence listener.
 o escapeSGML(String)
Escape angle brackets.
 o getQuoteStyle()
Retrieve the style in which quotes are being processed.
 o newLine(LineEvent)
Process a line.
 o removeParagraphListener(ParagraphListener)
Remove a paragraph listener.
 o removeSentenceListener(SentenceListener)
Remove a sentence listener.
 o replaceAllSubstring(String, String, String)
Replace all occurrences of a string.
 o replaceDirectionalisedQuotes(String)
Replace directionalised quote marks by entity names.
 o replaceMixedQuotes(String)
Replace mixed quote marks by entity names.
 o replacePlainQuotes(String)
Replace plain quote marks by entity names.
 o replaceSubstring(String, String, String)
Replace a substring.
 o setQuoteStyle(int)
Set the way quotes are being processed.

Field Detail

 o PLAINQUOTES
public static final int PLAINQUOTES
          plain quotes ("text")
 o MIXEDQUOTES
public static final int MIXEDQUOTES
          mixed quotes (`text")
 o DIRECTQUOTES
public static final int DIRECTQUOTES
          directionalised quotes (``text'')

Constructor Detail

 o Sentencer
public Sentencer()
          Constructor.

Method Detail

 o addSentenceListener
public void addSentenceListener(SentenceListener listener)
          Add a sentence listener. After each sentence, a SentenceEvent is sent to all registered listeners.
Parameters:
listener - the listener to register.
 o addParagraphListener
public void addParagraphListener(ParagraphListener listener)
          Add a paragraph listener. After each paragraph, a ParagraphEvent is sent to all registered listeners.
Parameters:
listener - the listener to register.
 o removeSentenceListener
public void removeSentenceListener(SentenceListener listener)
          Remove a sentence listener.
Parameters:
listener - the listener to be removed
 o removeParagraphListener
public void removeParagraphListener(ParagraphListener listener)
          Remove a paragraph listener.
Parameters:
listener - the listener to be removed
 o getQuoteStyle
public int getQuoteStyle()
          Retrieve the style in which quotes are being processed. The quote style can be defined through PLAINQUOTES, MIXEDQUOTES and DIRECTQUOTES. For examples of how these styles look like see the documentation entry for the respective constant.
Returns:
an integer describing the current state.
See Also:
setQuoteStyle(int)
 o setQuoteStyle
public void setQuoteStyle(int style) throws java.lang.IllegalArgumentException
          Set the way quotes are being processed.
Parameters:
style - the new style how to deal with quotes
Throws:
java.lang.IllegalArgumentException -
See Also:
getQuoteStyle()
 o replaceMixedQuotes
public static java.lang.String replaceMixedQuotes(java.lang.String line)
          Replace mixed quote marks by entity names. In the mixed style, quote marks are normalised so that a backquote (`) stands for an opening quote, while a double quote (") stands for a closing quote. This function replaces these quotes by the respective entity names bquo and equo. They will also be surrounded by spaces in order to make tokenisation easier.
Parameters:
line - a line that might contain quote marks.
Returns:
the same line with quote marks replaced.
 o replaceDirectionalisedQuotes
public static java.lang.String replaceDirectionalisedQuotes(java.lang.String line)
          Replace directionalised quote marks by entity names. Directionalised quote marks are `` and '' respectively. quote. This function replaces these quotes by the respective entity names bquo and equo. They will also be surrounded by spaces in order to make tokenisation easier.
Parameters:
line - a line that might contain quote marks.
Returns:
the same line with quote marks replaced.
 o replacePlainQuotes
public static java.lang.String replacePlainQuotes(java.lang.String line)
          Replace plain quote marks by entity names. The function tries to guess whether quotes are opening or closing, depending on the position of blank spaces around them. Opening quotes are replaced by bquo and closing ones by equo. Those which cannot be identified are replaced by quo. Effectively, all double quotes are removed from the input string. The entities will also be surrounded by spaces in order to make tokenisation easier.
Parameters:
line - a line that might contain quote marks.
Returns:
the same line with quote marks replaced.
 o escapeSGML
public static java.lang.String escapeSGML(java.lang.String line)
          Escape angle brackets. If the input text is not marked up in XML, but will later get enriched with tags, it is desirable to escape special characters used by XML. The characters to be replaced are &, >, and <.
Parameters:
line - a line that might contain special characters.
Returns:
the same line with characters replaced.
 o replaceSubstring
public static java.lang.String replaceSubstring(java.lang.String fullString,
                                      java.lang.String replace,
                                      java.lang.String replacement)
          Replace a substring. This function replaces the first occurrence of the given substring in the string given as the other argument.
Parameters:
fullString - the string on which the substitution takes place.
replace - the substring that should be replaced.
replacement - the string to replace that substring.
Returns:
the new string.
 o replaceAllSubstring
public static java.lang.String replaceAllSubstring(java.lang.String fullString,
                                         java.lang.String replace,
                                         java.lang.String replacement)
          Replace all occurrences of a string. This function replaces all occurrences of the given substring in the string given as the other argument.
Parameters:
fullString - the string on which the substitution takes place.
replace - the substring that should be replaced.
replacement - the string to replace that substring.
Returns:
the new string.
 o newLine
public void newLine(LineEvent le)
          Process a line. This method is called each time the object receives a new LineEvent. It sets of the processing of the line. The end of the input data is marked by an empty LineEvent or a null parameter.
Parameters:
le - the line event to process.