Home Answers Viewqa Java-Beginners Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words

 
 


Courtney
Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words
0 Answer(s)      a year and 6 months ago
Posted in : Java Beginners

I have to: 1.Retrieve the document text from the web (provided by utility class) 2.Filter the desired "words" form the document, and one by one, store each word as a key into a Map<String,Integer> object where the value is the number of occurrences of the word 3. Read the (word, num_occurrences) map entry pairs into an array/list structure of your choice 4. sort pairlist in a manner which sorts by num_occurrences 5. print: the total number of words processed, the number of unique words, the N pairs which have the largest number of occurrences.

Here's what I have so far -- The first class is the WebDoc utility class and the second is the main class. I have added blocks of commented out sections in which the new code should go. please help!

package util;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.HTML;
import java.io.InputStreamReader;
import java.io.IOException;
import java.net.URL;
import java.net.MalformedURLException;

public class WebDoc {

  public static String getBodyContent(String urlstr)
          throws MalformedURLException, IOException {
    /*
     * The following convoluted code is necessary because getParser()
     * is a protected method in HTMLEditorKit.

     * We create an anonymous extension of HTMLEditorKit with a public
     * getParser method calling the protected method of the superclass.
     */
    HTMLEditorKit.Parser parser = new HTMLEditorKit() {

      @Override
      public HTMLEditorKit.Parser getParser() {
        return super.getParser();
      }

    }.getParser();

    class DocStatus {
      public String content = "";
      public boolean body_started = false;
    }

    final DocStatus status = new DocStatus();

    HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback() {

      // handle the tags: look for the BODY tag
      @Override
      public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if (t == HTML.Tag.BODY) {
          status.body_started = true;
        }
      }

      // handle the text between tags: concatenate all text after BODY tag
      @Override
      public void handleText(char[] text, int position) {
        if (status.body_started) {
          status.content += String.valueOf(text) + " ";
        }
      }
    };

    URL url = new URL(urlstr);

    InputStreamReader r = new InputStreamReader(url.openStream());
    parser.parse(r, callback, true);

    return status.content;
  }
}

package dsprog3;

import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import util.WebDoc;

public class DSProg3 {
    public static void main(String[] args) {
        String url;

        //test URLs
        url = "http://en.wikipedia.org/wiki/Jimi_Hendrix";

        final int N = 25; //the number of word/frequency pairs to print

        //word pattern recognizes a string of 5 or more letters
        String word_pattern = "[A-Za-z]{5,}";

        String content = null;
        try {
            content = WebDoc.getBodyContent(url); // get body of the web document
        } catch (Exception ex) {
            ex.printStackTrace();
            System.exit(1);
        }

        Map<String,Integer> wordCount = new HashMap<String,Integer>();

        int total_words = 0;
        Matcher match = Pattern.compile(word_pattern).matcher(content);
        while(match.find()){
            ++total_words;
            //get the next word which matches the word_pattern
            //and normalize it by making it lower case
            String word = match.group().toLowerCase();

            //System.out.println(word); //use this for testing

            /**ADD CODE
              *
              * "register" one more occurrence of key, word, in the wordCount map
              */    
        }


        //System.out.println(wordCount); //use this for testing

        //use this class as is or modify it
        class WordPair {
            String word;
            Integer count; // number of occurrences
            WordPair(String word, Integer count) {
                this.word = word;
                this.count = count;
            }
        }

        /**ADD CODE
         *
         * Create an array/list structure to hold WordPair objects
         * Iterate through wordCount and store the Map entry pairs
         * into the array/list structure
         */


        /**ADD CODE
         *
         * Create a comparator for WordPair objects which compares by
         * the count component
         *
         * Then sort the array/list using this comparator
         */


        /**ADD CODE
         *
         * Print
         *      total_words
         *      # of unique words
         *      the N entries in the array/list corresponding to the
         *      pairs with the highest count values
         */
    }

}
View Answers









Related Pages:
Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words
Retrieve a list of words from a website and show a word count plus a specified number of most frequently occurring words  I have to: 1.Retrieve...; Integer count; // number of occurrences WordPair(String word
Find number of words begin with the specified character
Count number of words begin with the specified character In this section, you will learn how to count the number of words that begin with the specified... with the input character. If found, counter will count the words. Here is the code
Programming: Count Words - Dialog
Java NotesProgramming: Count Words - Dialog Description Write a program which counts the number of words in text the user enters. Assume that a word..." is any characters separated from others by a blank, "123" is a word, as well
Exercise - Count Words
Java: Exercise - Count Words Problem Write a method which counts the number of words in a string. Assume that a word is defined as a sequence of letters... the word count and switch the state of the boolean variable. Extensions
Exercise - Count Words
Java: Exercise - Count Words Problem Write a method which counts the number of words in a string. Assume that a word is defined as a sequence of letters.... countWords("Hello world")2Two word. countWords("Hello, world.")2Still two words
Programming: Count Words - Dialog
Java NotesProgramming: Count Words - Dialog Description Write a program which counts the number of words in text the user enters. Assume that a word... words, and so is "test this". Only count blanks if the last
java program to insert data into a file and count the number of words from the file???????
java program to insert data into a file and count the number of words from the file???????  java program to insert data into a file and count the number of words from the file
Java Word Count - Word Count Example in Java
to count the number of lines, number of words and number of characters... some strings and program will count the number of characters and number of words... of words and number of characters in the specified file. We will be declaring two
Count numbers of spaces and number of words from the input string
Count numbers of spaces and number of words from the input string Java In this section, you will learn how to count the number of spaces and words from... the number of spaces. Now, in order to count the number of words, we have splitted
Count words in a string method?
count the length of arr which return the number of words in the inputted string...Count words in a string method?  How do you Count words in a string...(" "); System.out.println("Number of words : "+arr.length); }catch(IOException e
Map words to line number in text file and show occurence
Map words to line number in text file and show occurence  hi i want to Map words to line number in text file and show occurrence of word in java coding
Java count frequency of words in the string
Java count frequency of words in the string. In this tutorial, you will learn how to count the occurrence of each word in the given string. String...++; String word = str.substring(count + 1, i); if (map.containsKey(word)) { map.put
Java count words from file
Java count words from file In this section, you will learn how to determine the number of words present in the file. Explanation: Java has provides several... by using the StringTokenizer class, we can easily count the number of words
Breaking the String into Words
into separate words. This program takes a string from user and breaks... between the words. This program also counts the number of words present in the string... Breaking the String into Words     
Java Convert Number to Words
: word = number % 10; if (word != 0) { show(" "); show(st2[0]); show(" "); pass(word); } number /= 10; break; case 3: word = number % 100; if (word != 0) { show(" "); show
Count instances of each word
Count instances of each word  I am working on a Java Project that reads a text file from the command line and outputs an alphabetical listing of the words preceded by the occurrence count. My program compiles and runs
tO FIND UNIQUE WORDS IN A FILE USING HASHMAP
(" "); // intialize an int array to hold count of each word counter= new int... and their counter // the word being the key and the number of occurences is the value... count of each word) System.out.println(map.get(temp.toString
Example: Count Bad Words
Java NotesExample: Count Bad Words Your grandmother has learned to program... of bad words that it finds and displays that number. 1 2 3 4 5... the grandchildren's email to make sure they aren't using any "bad" words. She doesn't know
Java Convert date to words
; break; case 2: word = number % 10; if (word != 0) { show(" "); show(st2[0]); show(" "); pass(word); } number /= 10; break; case 3: word = number % 100; if (word != 0) { show(" "); show(st2[1
JavaScript Count Words
JavaScript Count Words In this section, you will learn how to count words... and using the regular expression, determine the number of words and finally display... that will display the number of the words in the textbox as the user enter the words
from number to word
from number to word  i want to know weather there is any method that can be use in changing value from number to word. Example if i write ten thousand, it will automatically be written as 10000.   Java convert number
how to count words in string using java
++; } System.out.println("Number of words are: "+count); } } Thanks   Hello...how to count words in string using java  how to count words in string... count=0; String arr[]=st.split(" "); System.out.println("Number
Program to count the number of unique words in a file using HashMap
Program to count the number of unique words in a file using HashMap  import java.io.File; import java.io.FileNotFoundException; import java.util....()); System.out.println("The number of unique words: "+uniqueValues.size
Example - Read Words
for alphabetizing words. Sorting by word length will require writing a simple Comparator... for loop for going over the list of words and adding them to the text area... Java Notes: Example - Read Words The program below reads a text file
JavaScript split string into words.
. Str.split(" ",3)-split in word and return first 3 words...JavaScript split string into words.  How to split string into words..., used to specify the number of splits. str.split() ? it returns
Word Count
Word Count       This example counts the number of occurrences of  a specific word in a string. Here we are counting the occurrences of word "you" in a string
putting words to line number form a java file/test file and show occurrence
putting words to line number form a java file/test file and show occurrence  hi all i want putting words to line number form a java file and show occurrence but i cant use mapping method i can only use the LinkedList
Exercise - Capitalize Words
-alphabetic. This is slightly more difficult than Exercise - Count Words...-alphabetics between words should only count if they include at least one blank... Java: Exercise - Capitalize Words Problem Write a method which which
Convert Number To Words
Convert Number To Words       In this example, We are going to convert number to words. Code... the string representing the number. Here is the code of this program
java plus database - Java Beginners
count(*) FROM city where id="+number; int i=0; do...); ResultSet res = st.executeQuery("SELECT count(*) FROM city where id...java plus database  kathmandu Sports Club In order to have more
GPS Terminology
; Here are some of the most common words that you will come across while reading any article in any website related to GPS. These will provide you basic idea... a zigzag path. So it is the ultimate direction of the place from your base point
Technical words for Business Analyst
Technical words for Business Analyst  Please provide me the list of technical words that a business analyst can use while creating artifacts
Searching English words in a string
Searching English words in a string  My task is to find English words and separate them from a string with concatenated words..for example AhdgdjHOWAREgshshYOUshdhfh I need to find if there exists any English words.   
Count Palindromes from the string
++; } } System.out.println("Number of Palindromes in the specified string: " + count...Count Palindromes from the string In this section, we are going to find out the number of palindromes from the string. For this, we have allowed the user
Example - WordFrequency
*; /////////////////////////////////////////////// class CompareByFrequency /** For ordering words from least to most...*; /** Prints word frequency in source file. Ignores words in ignore file. * Uses Sets.... } //============================================================= getWordCount /** Returns number of words in the soure file(s
Display non-duplicate words from file
Display non-duplicate words from file In this tutorial, you will learn how to read a text file and display non-duplicate words in ascending order. The given... the list elements which are actually the non-duplicate words. data.txt: Where
Count number of "*"
Count number of "*"  I have this code to count the number of * from a string entered. but I need to find it from an text file. Any idea? import...:"); String text = bf.readLine(); int count = 0; for (int i = 0; i
Java Word Occurrence Example
name from command line which returns the number of occurrences of each word... can count the occurrences of each word in a file. In this example we will use... will demonstrate you about how to count occurrences of each word in a file. In this example
program to display frequency count of each word in a file using Hashmap, Hashset and streamtokenizer.plz help me out ..
words from HashSet and searching each word in fileContext Iterator<...program to display frequency count of each word in a file using Hashmap...++; } System.out.println("no of words"+count); fis.close(); } }  
Example - Replace word
Java: Example - Replace word Problem: Write a method to replaces all occurences a word in a string with another word. Assume the method signature... a new String. The word "replace" is probably misleading. Also, the method
Java Count word occurrence and export it to excel file
Java Count word occurrence and export it to excel file Here is an example of scanning a text file in a local drive, and count the frequency of each word in the text file or you can say count the number or occurrence of each word
Arrange the sentences in alphabetical order of words Java
Arrange the sentences in alphabetical order of words In Java Program In this section, we are going to sort the order of words in all the specified sentences. As the specified text consists of sentences that are terminated by either
QUE 50 ...please show output also...
element of the array Q21. Given a list of marks ranging from 0 to 100... Q27. Write a program to capitalize the first character of each word from.... Write a program that accepts a shopping list of five items from the command line
The Scannability of your website
of the users 79% scan the pages of a website and do not read it word by word. The content... stand out by highlighting. As compare to a print document the number of words... The Scannability of your website   
how to match the key word from a text file
how to match the key word from a text file  p>Hi all, I have the code to match the key word and from the text. I have input like this reader.txt...); String[] words = text1.split(" "); for (String word : words) { String regex1
Count number of occurences and print names alphabetically in Java
Count number of occurences and print names alphabetically in Java  I... for the printCount() method for the code to count the number of occurences of each...++; String word = str.substring(count + 1, i
Underscore b/w consecutive capatalized words
Underscore b/w consecutive capatalized words  I have a text something like this "25km from N of Rio Negro" I want to add underscore between consecutive Capatalized words having space between them. Can some one give me
Count repetitions of every word from an input file
Count repetitions of every word from an input file  Hello..i got to know how can i count the repetitions of every word present in a specific input... recorded i need to count only the url patterns like google,yahoo etc, plz help me
word and character counting - Java Beginners
word and character counting  here is the java code i made but i have to add something where it will read the inFile and display the number of words and number of characters.. can you help me with it? thanks.. :) import

Ask Questions?

If you are facing any programming issue, such as compilation errors or not able to find the code you are looking for.

Ask your questions, our development team will try to give answers to your questions.