Calculating Repetingwords using Regular expression

This Example describe the way to calculate the Repeating word from the file using Regularexpression.

Calculating Repetingwords using Regular expression

Calculating Repetingwords using Regular expression

     

This Example describe the way to calculate the Repeating word from the file using Regularexpression. The steps involved in calculating the repeating words are described below:-

String file = "/home/girish/Desktop/D.txt":-Declares the file from where the words are to be counted.

FileInputStream inputStream = new FileInputStream(file):-Creates a file inputstream and gets input bytes from a file .

FileChannel fileChannel = inputStream.getChannel():- Creates 
object of FileChannel that is associated with the file input stream.

MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileLength):-
MappedByteBuffer is a buffer whose data is memorymapped with the file.

Charset charset = Charset.forName("ISO-8859-1"):-Creates a charset which is used for creating decoders and encoders.

 

Calculatingword.java

import java.io.FileInputStream;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.Map;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Calculatingword {

 public static void main(String args[]) throws Exception {
  String file = "/home/girish/Desktop/D.txt";
  
  FileInputStream inputStream = new FileInputStream(file);
  FileChannel fileChannel = inputStream.getChannel();
  System.out.println(fileChannel);
  int fileLength = (intfileChannel.size();
  MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0,
  fileLength);
  
  Charset charset = Charset.forName("ISO-8859-1");
  CharsetDecoder cd = charset.newDecoder();
  
  CharBuffer charBuffer = cd.decode(mbb);
  System.out.println("========================" +
  "File from where words are counted=============");
  System.out.println(charBuffer);
 System.out.println("========================" +
 "===================================");
  
  Pattern pattern = Pattern.compile(".*$", Pattern.MULTILINE);
  Pattern patternword = Pattern.compile("[\\p{Punct}\\s}]");

  Matcher Lmatcher = pattern.matcher(charBuffer);
  Map map = new TreeMap();
  Integer one = new Integer(1);

  while (Lmatcher.find()) {
  CharSequence sequence = Lmatcher.group();
  String word[] = patternword.split(sequence);

  for (int i = 0, n = word.length; i < n; i++) {
  if (word[i].length() 0) {
  Integer times = (Integermap.get(word[i]);
  if (times == null) {
  times =one;
  else {
  int value = times.intValue();
  times = new Integer(value + 1);
  }
  map.put(word[i], times);
  }
  }
  }
  System.out.println("No of times words repeted are :"+"\n"+map);
  }
}

Output of the program:-

======File from where words are counted=======
Angeles Angeles Angeles Angeles Angeles  
Angeles Angele  Angele  Angele Angele
Angele Angele Angele Angele Angele Angel 
Angel Angel Angel Angel Angel Angel 
Angel Angel Angel Angel Ange Ange
Ange Ange Ange Ange Ange Ange Ange
Ange dcjk
4645423 24221 224121 5245241 55241
542541 441 5541 41441
===================================
No of times words repeted are :
{224121=1, 24221=1, 41441=1, 441=1,
4645423=1, 5245241=1, 542541=1,
55241=1, 5541=1, Ange=10,
Angel=11, Angele=9, Angeles=6, dcjk=1}

Download Source Code