Calculating Repetingwords using Regular expression
This Example describe the way
to calculate the Repeating word from the file using Regularexpression.
The steps involved in calculating the repeating words are described below:-
String file = "/home/girish/Desktop/D.txt":-Declares the file from where the words are to be counted.
FileInputStream inputStream = new FileInputStream(file):-Creates a file inputstream and gets input bytes from a file .
FileChannel fileChannel = inputStream.getChannel():- Creates object of FileChannel that is associated with the file input stream.
MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileLength):-MappedByteBuffer is a buffer whose data is memorymapped with the file.
Charset charset = Charset.forName("ISO-8859-1"):-Creates a charset which is used for creating decoders and encoders.
Calculatingword.java
import java.io.FileInputStream;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.Map;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Calculatingword {
public static void main(String args[]) throws Exception {
String file = "/home/girish/Desktop/D.txt";
FileInputStream inputStream = new FileInputStream(file);
FileChannel fileChannel = inputStream.getChannel();
System.out.println(fileChannel);
int fileLength = (int) fileChannel.size();
MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0,
fileLength);
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder cd = charset.newDecoder();
CharBuffer charBuffer = cd.decode(mbb);
System.out.println("========================" +
"File from where words are counted=============");
System.out.println(charBuffer);
System.out.println("========================" +
"===================================");
Pattern pattern = Pattern.compile(".*$", Pattern.MULTILINE);
Pattern patternword = Pattern.compile("[\\p{Punct}\\s}]");
Matcher Lmatcher = pattern.matcher(charBuffer);
Map map = new TreeMap();
Integer one = new Integer(1);
while (Lmatcher.find()) {
CharSequence sequence = Lmatcher.group();
String word[] = patternword.split(sequence);
for (int i = 0, n = word.length; i < n; i++) {
if (word[i].length() > 0) {
Integer times = (Integer) map.get(word[i]);
if (times == null) {
times =one;
} else {
int value = times.intValue();
times = new Integer(value + 1);
}
map.put(word[i], times);
}
}
}
System.out.println("No of times words repeted are :"+"\n"+map);
}
}
|
|
Output of the program:-
======File from where words are counted=======
Angeles Angeles Angeles Angeles Angeles
Angeles Angele Angele Angele Angele
Angele Angele Angele Angele Angele Angel
Angel Angel Angel Angel Angel Angel
Angel Angel Angel Angel Ange Ange
Ange Ange Ange Ange Ange Ange Ange
Ange dcjk
4645423 24221 224121 5245241 55241
542541 441 5541 41441
===================================
No of times words repeted are :
{224121=1, 24221=1, 41441=1, 441=1,
4645423=1, 5245241=1, 542541=1,
55241=1, 5541=1, Ange=10,
Angel=11, Angele=9, Angeles=6, dcjk=1} |
Download Source Code