Home Tutorial Java Corejava Javatext Determining the Sentence Boundaries in a Unicode String

 
 

Share on Google+Share on Google+
Determining the Sentence Boundaries in a Unicode String
Posted on: October 13, 2010 at 12:00 AM
Advertisement
In this section, you will learn how to determine the sentence boundaries in a unicode string.

Determining the Sentence Boundaries in a Unicode String

In this section, you will learn how to determine the sentence boundaries in a unicode string.

From the previous section, you all are aware of BreakIterator class. This class provides some powerful capabilities in a language-independent manner. This class is actually case-sensitive. It provides four factory methods- getLineInstance(), getCharacterInstance(), getWordInstance() and getSentenceInstance(). Here we are going to parse the text and break it into sentences.

You can see in the given example, we have invoked the factory method getSentenceInstance() and passed the text with some special characters in it like !, ? to the method setText(). The method getSentenceInstance() create BreakIterator for sentence-breaks using default locale and returns an instance of a BreakIterator implementing sentence breaks. The setText() method set the text string to be scanned. Then we have created a loop to find the location of the characters(! and ?) which break the text into sentences.

current(): This method of BreakIterator class return character index of the text boundary that was most recently returned.

next(): This method of BreakIterator class return the boundary following the current boundary.

Here is the code:

import java.text.*;

public class SentenceBoundaries {
	public static void main(String[] args) {
		String st = "Hello!How are you?This is a BreakIterator Example", str = "";
		BreakIterator bi = BreakIterator.getSentenceInstance();
		bi.setText(st);
		int index = 0;
		while (bi.next() != BreakIterator.DONE) {
			str = st.substring(index, bi.current());
			System.out.println(str);
			index = bi.current();
		}
	}
}

Output:

Hello!
How are you?
This is a BreakIterator Example

Advertisement

Related Tags for Determining the Sentence Boundaries in a Unicode String:


Follow us on Twitter, or add us on Facebook or Google Plus to keep you updated with the recent trends of Java and other open source platforms.

Posted on: October 13, 2010

Recommend the tutorial

Advertisements Advertisements
 

 

 

DMCA.com