Home Tutorial Java Corejava Javatext Determining the Word Boundaries in a Unicode String

 
 

Share on Google+Share on Google+
Determining the Word Boundaries in a Unicode String
Posted on: October 13, 2010 at 12:00 AM
Advertisement
In this section, you will learn how to determine the word boundaries in a unicode string.

Determining the Word Boundaries in a Unicode String

In this section, you will learn how to determine the word boundaries in a unicode string.

Generally, we use split() method and StringTokenizer class to break the string into words. But the class BreakIterator has some great advantage over it. It provides some powerful parsing capabilities in a language-independent manner. It provide methods to find the location of boundaries in the text string. Here we are going to break the string into words.

In the given example, we have invoked the factory method getWordInstance() and passed a string 'This is a BreakIterator Example' to the setText() method. The method getCharactersInstance() create BreakIterator for word-breaks using default locale. The setText() method set the text string to be scanned. Then, we have created a loop to find the location of word boundaries from the string which break the text into different words.

current(): This method of BreakIterator class return character index of the text boundary that was most recently returned.

next(): This method of BreakIterator class return the boundary following the current boundary.

Here is the code:

import java.text.*;

public class WordBoundaries {
	public static void main(String[] args) {
		String str = "", st = "This is a BreakIterator Example";
		BreakIterator bi = BreakIterator.getWordInstance();
		bi.setText(st);
		int index = 0;
		while (bi.next() != BreakIterator.DONE) {
			str = st.substring(index, bi.current());
			System.out.println(str);
			index = bi.current();
		}
	}
}

Output:

This

is

a

BreakIterator

Example

Advertisement

Related Tags for Determining the Word Boundaries in a Unicode String:


Follow us on Twitter, or add us on Facebook or Google Plus to keep you updated with the recent trends of Java and other open source platforms.

Posted on: October 13, 2010

Recommend the tutorial

Advertisements Advertisements
 

 

 

DMCA.com