Converting PDF in to XML

Converting PDF in to XML

I have to convert PDF into XMl without any loss in text. Please suggest sth good.

View Answers

September 27, 2012 at 11:53 AM

Here is a code that converts pdf to xml. You need itext api to run the given code.

import java.io.*;
import java.util.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import org.xml.sax.helpers.*;
import javax.xml.transform.sax.*;
import javax.xml.transform.stream.*;
import com.lowagie.text.*;
import com.lowagie.text.pdf.*;

public class ConvertPDFToXML {
        static StreamResult streamResult;
        static TransformerHandler handler;
        static AttributesImpl atts;

        public static void main(String[] args) throws IOException {

                try {
                        Document document = new Document();
                        document.open();
                        PdfReader reader = new PdfReader("C:\\hello.pdf");
                        PdfDictionary page = reader.getPageN(1);
                        PRIndirectReference objectReference = (PRIndirectReference) page
                                        .get(PdfName.CONTENTS);
                        PRStream stream = (PRStream) PdfReader
                                        .getPdfObject(objectReference);
                        byte[] streamBytes = PdfReader.getStreamBytes(stream);
                        PRTokeniser tokenizer = new PRTokeniser(streamBytes);

                        StringBuffer strbufe = new StringBuffer();
                        while (tokenizer.nextToken()) {
                                if (tokenizer.getTokenType() == PRTokeniser.TK_STRING) {
                                        strbufe.append(tokenizer.getStringValue());
                                }
                        }
                        String test = strbufe.toString();
                        streamResult = new StreamResult("data.xml");
                        initXML();
                        process(test);
                        closeXML();
                        document.add(new Paragraph(".."));
                        document.close();
                } catch (Exception e) {
                }
        }

        public static void initXML() throws ParserConfigurationException,
                        TransformerConfigurationException, SAXException {
                SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory
                                .newInstance();

                handler = tf.newTransformerHandler();
                Transformer serializer = handler.getTransformer();
                serializer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
                serializer.setOutputProperty(
                                "{http://xml.apache.org/xslt}indent-amount", "4");
                serializer.setOutputProperty(OutputKeys.INDENT, "yes");
                handler.setResult(streamResult);
                handler.startDocument();
                atts = new AttributesImpl();
                handler.startElement("", "", "Roseindia", atts);
        }

        public static void process(String s) throws SAXException {
                String[] elements = s.split("\\|");
                atts.clear();
                handler.startElement("", "", "Message", atts);
                handler.characters(elements[0].toCharArray(), 0, elements[0].length());
                handler.endElement("", "", "Message");
        }

        public static void closeXML() throws SAXException {
                handler.endElement("", "", "Roseindia");
                handler.endDocument();
        }
}

September 27, 2012 at 1:22 PM

thanks for this!! It is converting the entire page into one tag.My requirement is that for every line in PDF I get one xml tag.

Ads









Related Tutorials/Questions & Answers:
Converting PDF in to XML
Converting PDF in to XML  I have to convert PDF into XMl without any loss in text. Please suggest sth good
Java and converting a PDF to XML
Java and converting a PDF to XML  I reviewed your example - https://www.roseindia.net/tutorial/java/xml/pdftoXML.html - and on many users group, it receives high praise. I understand it uses iText but I am having a problem
Advertisements
Java and converting a PDF to XML
Java and converting a PDF to XML  I reviewed your example - https://www.roseindia.net/tutorial/java/xml/pdftoXML.html - and on many users group, it receives high praise. I understand it uses iText but I am having a problem
Converting XML to string
Converting XML to string  Hi Friends, I had an requirement in which a java program receives an xml messages from external web application such as (jsp, php) and then it converts the received xml message into a string. could
PDF to XML Exsample
PDF to XML Exsample  I really liked the example of converting PDF to XML at the URL:https://www.roseindia.net/tutorial/java/xml/pdftoXML.html BUT, I am having problems with the packages for com.lowagie.text.
converting an xml file in relational database objects
converting an xml file in relational database objects  converting an xml file in relational database objects
pdf to xml conversion
pdf to xml conversion  i want to convert pdf file into xml file.. where i am having a table in pdf file with some headers and some data into it. i want the headers to be the tag of xml file. how can i do that using java? please
xml Converting to java using JDOM
xml Converting to java using JDOM  Hello , I am new to java and JDom so i make a Xml file and i need help to read it from java using objects , my... and getter , i dont know how to differentiate that this for example: line in xml
ModuleNotFoundError: No module named 'xml-archive-to-pdf'
ModuleNotFoundError: No module named 'xml-archive-to-pdf'  Hi, My... named 'xml-archive-to-pdf' How to remove the ModuleNotFoundError: No module named 'xml-archive-to-pdf' error? Thanks   Hi, In your
ModuleNotFoundError: No module named 'xml-archive-to-pdf'
ModuleNotFoundError: No module named 'xml-archive-to-pdf'  Hi, My... named 'xml-archive-to-pdf' How to remove the ModuleNotFoundError: No module named 'xml-archive-to-pdf' error? Thanks   Hi, In your
converting one file format to another
converting one file format to another  Hi ser I need a code to export data from data grid to PDF and XL format plz help me out
Converting ISO8601-compliant String to java.util.Date
Converting ISO8601-compliant String to java.util.Date  Converting ISO8601-compliant String to java.util.Date
converting string to double in java
converting string to double in java  Please post an example to converting string to double in java. Thanks!   Convert String to Double Tutorial
converting field to text sql
converting field to text sql  I wanted to convert the field to TEXT in SQL.. is it possible?   SQL - Converting the field to TEXT works SELECT CONVERT(TEXT,fld_name) FROM TABLE_NAME   SQL - Converting
PDF to Image
PDF to Image  Java code to convert PDF to Image
converting decimal to base 32
converting decimal to base 32  procedure for converting decimal to base 32   CREATE PROCEDURE RTCONVERSION ( @valueToConvert int, @convertedValue varchar(20) out ) AS declare @counter int; declare @num int; declare @x
XML
XML  How i remove a tag from xml and update it in my xml
pdf marking
pdf marking  Hi i am working on online answer sheet evaluation using jsp. It need pdf marking.drag and drop images onto pdf. please help me regards reshma
pdf marking
pdf marking  Hi i am working on online answer sheet evaluation using jsp. It need pdf marking.drag and drop images onto pdf. please help me regards reshma
xml
xml  why the content written in xml is more secure
xml
xml  validate student login using xml for library management system
xml
xml  validate student login using xml for library management system
pdf to text
pdf to text  how to covert pdf file (which contain table and text) into word or excel file using itext api
pdf generation.
pdf generation.  i want to generate the data which is stored in mysql data base in pdf format with php. how i will do
xml
xml  what is name space,xml scema give an example for each   XML Namespaces provide a method to avoid element name conflicts.They are used for providing uniquely named elements and attributes in an XML document
Converting Text Files into Bzip File
Converting Text Files into Bzip File  Hi, I am facing the problem during run the program, when converting text files into Bzip file. Please guide me how do i convert the text file into bzip file in PHP. I will welcome, if anyone
Converting NSURL to NSString creating a problem
Converting NSURL to NSString creating a problem  Hi , In my iPad/iPhone universal application, we are trying to covert theNSURL to NSString which cause the application crash. Does any one have idea about converting NSURL
Converting NSURL to NSString creating a problem
Converting NSURL to NSString creating a problem  Hi , In my iPad/iPhone universal application, we are trying to covert theNSURL to NSString which cause the application crash. Does any one have idea about converting NSURL
Converting jsp variable to java variable
Converting jsp variable to java variable  Hi how to convert java script variable to java variable on same jsp page
xml
xml  how can i remove white space and next line when i copy stream to xml file
XML
XML  please tell me how i remove one tag out of all similar type of tags in xml
XML
XML  create flat file with 20 records. Read the records using xml parser and show required details
xml
xml  what is xml   Extensible Markup Language (XML... that is both human-readable and machine-readable. It is defined in the XML 1.0... gratis open standards. The design goals of XML emphasize simplicity, generality
PDF document
PDF document  hello, How to Open a PDF document on iPhone??   You can use these There's a whole toolkit built in which lets you render PDF pages to a UIView. Check out: CGPDFDocumentCreateWithURL
pdf to database
pdf to database  Hi, I want to read the data from pdf(pdf file is having 50 fields) which is placed in database file and store that into MySQL database. I want this process untill the rows completed in the database file
upload pdf
upload pdf   i want to dispal content of pdf fil and stored into database in human readable form using php . how can i do
losing precision converting from java BigDecimal to double
losing precision converting from java BigDecimal to double  losing precision converting from java BigDecimal to double
PDF Comparator
PDF Comparator  Hi Guys, I need to develop a program which should compare a set of pdf files stored in one folder with a set of pdf files stored in another folder. Both folders should contain the same no. of pdf files with same
How to Convert PDF into rtf File Java
uses the itext api for converting PDF file into RTF file. You can find...How to Convert PDF into rtf File Java  Hi, How could in covert the PDF file to rtf file in Java Programming. Please suggest any online example
XML
XML  Design an XML to maintain book details to do the following: (i) Separate Data (ii) Exchange Data (iii) Store Data (iv) Create new language
xml
xml  Design an XML to maintain book details to do the following: (i) Separate Data (ii) Exchange Data (iii) Store Data (iv) Create new language
ffmpeg converting file mp4 to mov
ffmpeg converting file mp4 to mov  Hi, How to convert the mp4 file into mov file? Thanks   Hi, You should install ffmpeg on your computer and then use the following command to convert mp4 file into mov file
pdf restriction
pdf restriction  i have certain pdf files that have restrictions on it as copy ,read,extract text etc.. i want to remove restriction by java code. so is there any way to do this? plz help thanks in advance rohit
itext pdf
itext pdf  i am generating pdf using java i want to do alignment in pdf using java but i found mostly left and right alignment in a row. i want to divide single row in 4 parts then how can i do
open pdf in uiwebview
open pdf in uiwebview  Hi, How to open pdf in uiwebview? Thanks
how to load pdf on html
how to load pdf on html  how to load pdf on html
Download PDF file
Download PDF file  How to download PDF file with JSF
Creating PDF in JAVA
Creating PDF in JAVA  How create pdf in java ? Take value from database for particular PDF
pdf to voice converter
pdf to voice converter  is it possible to implement PDF to speech converter by extracting text from pdf and then text to speech
xml output
xml output  generate an xml output in the following format <FileCount> <DOC>AA <RTF>BB <PDF>CC <Total>DD where AA=total number of .DOC files found BB=total number of .RTF files found,etc

Ads