Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words

Ads
 

Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words

I have to: 1.Retrieve the document text from the web (provided by utility class) 2.Filter the desired "words" form the document, and one by one, store each word as a key into a Map<String,Integer> object where the value is the number of occurrences of the word 3. Read the (word, num_occurrences) map entry pairs into an array/list structure of your choice 4. sort pairlist in a manner which sorts by num_occurrences 5. print: the total number of words processed, the number of unique words, the N pairs which have the largest number of occurrences.

Here's what I have so far -- The first class is the WebDoc utility class and the second is the main class. I have added blocks of commented out sections in which the new code should go. please help!

package util;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.HTML;
import java.io.InputStreamReader;
import java.io.IOException;
import java.net.URL;
import java.net.MalformedURLException;

public class WebDoc {

  public static String getBodyContent(String urlstr)
          throws MalformedURLException, IOException {
    /*
     * The following convoluted code is necessary because getParser()
     * is a protected method in HTMLEditorKit.

     * We create an anonymous extension of HTMLEditorKit with a public
     * getParser method calling the protected method of the superclass.
     */
    HTMLEditorKit.Parser parser = new HTMLEditorKit() {

      @Override
      public HTMLEditorKit.Parser getParser() {
        return super.getParser();
      }

    }.getParser();

    class DocStatus {
      public String content = "";
      public boolean body_started = false;
    }

    final DocStatus status = new DocStatus();

    HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback() {

      // handle the tags: look for the BODY tag
      @Override
      public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if (t == HTML.Tag.BODY) {
          status.body_started = true;
        }
      }

      // handle the text between tags: concatenate all text after BODY tag
      @Override
      public void handleText(char[] text, int position) {
        if (status.body_started) {
          status.content += String.valueOf(text) + " ";
        }
      }
    };

    URL url = new URL(urlstr);

    InputStreamReader r = new InputStreamReader(url.openStream());
    parser.parse(r, callback, true);

    return status.content;
  }
}

package dsprog3;

import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import util.WebDoc;

public class DSProg3 {
    public static void main(String[] args) {
        String url;

        //test URLs
        url = "http://en.wikipedia.org/wiki/Jimi_Hendrix";

        final int N = 25; //the number of word/frequency pairs to print

        //word pattern recognizes a string of 5 or more letters
        String word_pattern = "[A-Za-z]{5,}";

        String content = null;
        try {
            content = WebDoc.getBodyContent(url); // get body of the web document
        } catch (Exception ex) {
            ex.printStackTrace();
            System.exit(1);
        }

        Map<String,Integer> wordCount = new HashMap<String,Integer>();

        int total_words = 0;
        Matcher match = Pattern.compile(word_pattern).matcher(content);
        while(match.find()){
            ++total_words;
            //get the next word which matches the word_pattern
            //and normalize it by making it lower case
            String word = match.group().toLowerCase();

            //System.out.println(word); //use this for testing

            /**ADD CODE
              *
              * "register" one more occurrence of key, word, in the wordCount map
              */    
        }


        //System.out.println(wordCount); //use this for testing

        //use this class as is or modify it
        class WordPair {
            String word;
            Integer count; // number of occurrences
            WordPair(String word, Integer count) {
                this.word = word;
                this.count = count;
            }
        }

        /**ADD CODE
         *
         * Create an array/list structure to hold WordPair objects
         * Iterate through wordCount and store the Map entry pairs
         * into the array/list structure
         */


        /**ADD CODE
         *
         * Create a comparator for WordPair objects which compares by
         * the count component
         *
         * Then sort the array/list using this comparator
         */


        /**ADD CODE
         *
         * Print
         *      total_words
         *      # of unique words
         *      the N entries in the array/list corresponding to the
         *      pairs with the highest count values
         */
    }

}
View Answers

Ads









Related Tutorials/Questions & Answers:
Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words
Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words  I have to: 1.Retrieve...; Integer count; // number of occurrences WordPair(String word
java program to insert data into a file and count the number of words from the file???????
java program to insert data into a file and count the number of words from the file???????  java program to insert data into a file and count the number of words from the file
Advertisements
Map words to line number in text file and show occurence
Map words to line number in text file and show occurence  hi i want to Map words to line number in text file and show occurrence of word in java coding
Count words in a string method?
count the length of arr which return the number of words in the inputted string...Count words in a string method?  How do you Count words in a string...(" "); System.out.println("Number of words : "+arr.length); }catch(IOException e
Program to count the number of unique words in a file using HashMap
Program to count the number of unique words in a file using HashMap  import java.io.File; import java.io.FileNotFoundException; import java.util....()); System.out.println("The number of unique words: "+uniqueValues.size
ModuleNotFoundError: No module named 'the-count-of-words'
ModuleNotFoundError: No module named 'the-count-of-words'  Hi, My... named 'the-count-of-words' How to remove the ModuleNotFoundError: No module named 'the-count-of-words' error? Thanks   Hi, In your
from number to word
from number to word  i want to know weather there is any method that can be use in changing value from number to word. Example if i write ten thousand, it will automatically be written as 10000.   Java convert number
ModuleNotFoundError: No module named 'number-to-words'
ModuleNotFoundError: No module named 'number-to-words'  Hi, My... named 'number-to-words' How to remove the ModuleNotFoundError: No module named 'number-to-words' error? Thanks   Hi, In your python
putting words to line number form a java file/test file and show occurrence
putting words to line number form a java file/test file and show occurrence  hi all i want putting words to line number form a java file and show occurrence but i cant use mapping method i can only use the LinkedList
how to count words in string using java
++; } System.out.println("Number of words are: "+count); } } Thanks   Hello...how to count words in string using java  how to count words in string... count=0; String arr[]=st.split(" "); System.out.println("Number
Java count frequency of words in the string
Java count frequency of words in the string. In this tutorial, you will learn how to count the occurrence of each word in the given string. String...++; String word = str.substring(count + 1, i); if (map.containsKey(word)) { map.put
ModuleNotFoundError: No module named 'words-from-grid'
ModuleNotFoundError: No module named 'words-from-grid'  Hi, My... named 'words-from-grid' How to remove the ModuleNotFoundError: No module named 'words-from-grid' error? Thanks   Hi, In your python
Count repetitions of every word from an input file
Count repetitions of every word from an input file  Hello..i got to know how can i count the repetitions of every word present in a specific input... recorded i need to count only the url patterns like google,yahoo etc, plz help me
Java Convert date to words
; } public void pass(int number) { int word, q; if (number < 10) { show(st1... = number / 10; show(st4[q - 2]); } else { q = number / 10; show(st1[word]); show...; break; case 2: word = number % 10; if (word != 0) { show(" "
retrieve record from table and show it in HTML
retrieve record from table and show it in HTML  Hi. I have a field...,trichy,kanchipuram for a single record. I have to retrieve these data from... as single values like chennai as one value, trichy as one value. and i have to show
Count instances of each word
Count instances of each word  I am working on a Java Project that reads a text file from the command line and outputs an alphabetical listing of the words preceded by the occurrence count. My program compiles and runs
retrieve value from database on the basis of maximum id number
retrieve value from database on the basis of maximum id number  hi, i want to retrieve value of maximum id number from the database and show that value in jTextField.when the user clicks on the button that maximum id number
JavaScript split string into words.
. Str.split(" ",3)-split in word and return first 3 words...JavaScript split string into words.  How to split string into words..., used to specify the number of splits. str.split() ? it returns
Technical words for Business Analyst
Technical words for Business Analyst  Please provide me the list of technical words that a business analyst can use while creating artifacts
java script to replace words starting and ending with tild(~) symbol in a text area with the word highlighted yellow.
java script to replace words starting and ending with tild(~) symbol in a text area with the word highlighted yellow.  Hi, I have an issue... requirement we have to replace all the occurence of such words and replace
Display non-duplicate words from file
Display non-duplicate words from file In this tutorial, you will learn how to read a text file and display non-duplicate words in ascending order. The given... the list elements which are actually the non-duplicate words. data.txt: Where
Count number of "*"
Count number of "*"  I have this code to count the number of * from a string entered. but I need to find it from an text file. Any idea? import...:"); String text = bf.readLine(); int count = 0; for (int i = 0; i
tO FIND UNIQUE WORDS IN A FILE USING HASHMAP
(" "); // intialize an int array to hold count of each word counter= new int... and their counter // the word being the key and the number of occurences is the value... count of each word) System.out.println(map.get(temp.toString
Searching English words in a string
Searching English words in a string  My task is to find English words and separate them from a string with concatenated words..for example AhdgdjHOWAREgshshYOUshdhfh I need to find if there exists any English words.   
ModuleNotFoundError: No module named 'words'
ModuleNotFoundError: No module named 'words'  Hi, My Python program is throwing following error: ModuleNotFoundError: No module named 'words' How to remove the ModuleNotFoundError: No module named 'words'
retrieve Dept Name from table dept and retrieve list of employee from emp table for that dept in combobox
retrieve Dept Name from table dept and retrieve list of employee from emp table... 10 in Dept Number Textbox and onblur event,it should retrieve Dept Name from table dept(select deptno,deptname from dept where deptno=10) and retrieve list
Reserved words in R Programming
such as variables, functions etc... Here is the list of reserved words in R...Reserved Words of R Programming Language Every programming language reserves certain words and it can be used in making variables in the Programming
display co-occurrence words in a file
display co-occurrence words in a file  how to write java program for counting co occurred words in the file
display co-occurrence words in a file
display co-occurrence words in a file  how to write java program for counting co occurred words in the file
how to retrieve image from mysql database using java and show it in HTML img tag ?
how to retrieve image from mysql database using java and show it in HTML img tag ?  how to retrieve image from mysql database using java and show it in HTML img tag
CONVERT VALUE MONEY TO WORDS IN SQL?
CONVERT VALUE MONEY TO WORDS IN SQL?  i want to covert money or varchar value (like 7500000 ) in words like (75 lacs) then how to convert this value in this words . please give me solution

Ads