How to read the .doc/ .docx file in Java Program

Hi,

I am beginner in Java programming language. Can anybody explain How to read .doc file in Java language. Please provide any online reference so that i try it.

Regards,

View Answers

February 25, 2011 at 11:08 AM

Hi,

In java programming language we normally using the POI Library to read the word document file. For doing this we will make class HWPFDocument which throw all of the Word file data and the class WordExtractor extract the text from the Word Document. The method getParagraphText() of WordExtractor class get the text from the word file as an array and display the data on word file. This way you can read the .doc or .docx file in Java programming language.

Thanks

August 30, 2012 at 6:52 PM

package doc.reader;

import org.apache.poi.poifs.filesystem.*; import org.apache.poi.hpsf.DocumentSummaryInformation; import org.apache.poi.hwpf.*; import org.apache.poi.hwpf.extractor.*; import org.apache.poi.hwpf.usermodel.HeaderStories;

import java.io.*;

public class ReadDocFileFromJava {

public static void main(String[] args) {
    /**This is the document that you want to read using Java.**/
    String fileName = "d\\Resume.doc";

    /**Method call to read the document (demonstrate some useage of POI)**/
    readMyDocument(fileName);

}
public static void readMyDocument(String fileName){
    POIFSFileSystem fs = null;
    try {
        fs = new POIFSFileSystem(new FileInputStream(fileName));
        HWPFDocument doc = new HWPFDocument(fs);

        /** Read the content **/
        readParagraphs(doc);

        int pageNumber=1;

        /** We will try reading the header for page 1**/
        readHeader(doc, pageNumber);

        /** Let's try reading the footer for page 1**/
        readFooter(doc, pageNumber);

        /** Read the document summary**/
        readDocumentSummary(doc);

    } catch (Exception e) {
        e.printStackTrace();
    }
}   

public static void readParagraphs(HWPFDocument doc) throws Exception{
    WordExtractor we = new WordExtractor(doc);

    /**Get the total number of paragraphs**/
    String[] paragraphs = we.getParagraphText();
    System.out.println("Total Paragraphs: "+paragraphs.length);

    for (int i = 0; i < paragraphs.length; i++) {

        System.out.println("Length of paragraph "+(i +1)+": "+ paragraphs[i].length());
        System.out.println(paragraphs[i].toString());

    }

}

public static void readHeader(HWPFDocument doc, int pageNumber){
    HeaderStories headerStore = new HeaderStories( doc);
    String header = headerStore.getHeader(pageNumber);
    System.out.println("Header Is: "+header);

}

public static void readFooter(HWPFDocument doc, int pageNumber){
    HeaderStories headerStore = new HeaderStories( doc);
    String footer = headerStore.getFooter(pageNumber);
    System.out.println("Footer Is: "+footer);

}

public static void readDocumentSummary(HWPFDocument doc) {
    DocumentSummaryInformation summaryInfo=doc.getDocumentSummaryInformation();
    String category = summaryInfo.getCategory();
    String company = summaryInfo.getCompany();
    int lineCount=summaryInfo.getLineCount();
    int sectionCount=summaryInfo.getSectionCount();
    int slideCount=summaryInfo.getSlideCount();

    System.out.println("---------------------------");
    System.out.println("Category: "+category);
    System.out.println("Company: "+company);
    System.out.println("Line Count: "+lineCount);
    System.out.println("Section Count: "+sectionCount);
    System.out.println("Slide Count: "+slideCount);

}

}

How to read the .doc/ .docx file in Java Program

Questions by Category