Latest Tutorials| Questions and Answers|Ask Questions?|Site Map



Home Answers Viewqa Java-Beginners Converting PDF in to XML
Login         

View Questions and Answers by Category

Advertisements


 
Have Programming Question? Ask it here!
 
 
 


Rakesh
Converting PDF in to XML
2 Answer(s)      2 years and a month ago
Posted in : Java Beginners


I have to convert PDF into XMl without any loss in text. Please suggest sth good.


Advertisement
View Answers

September 27, 2012 at 11:53 AM


Here is a code that converts pdf to xml. You need itext api to run the given code.

import java.io.*;
import java.util.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import org.xml.sax.helpers.*;
import javax.xml.transform.sax.*;
import javax.xml.transform.stream.*;
import com.lowagie.text.*;
import com.lowagie.text.pdf.*;

public class ConvertPDFToXML {
        static StreamResult streamResult;
        static TransformerHandler handler;
        static AttributesImpl atts;

        public static void main(String[] args) throws IOException {

                try {
                        Document document = new Document();
                        document.open();
                        PdfReader reader = new PdfReader("C:\\hello.pdf");
                        PdfDictionary page = reader.getPageN(1);
                        PRIndirectReference objectReference = (PRIndirectReference) page
                                        .get(PdfName.CONTENTS);
                        PRStream stream = (PRStream) PdfReader
                                        .getPdfObject(objectReference);
                        byte[] streamBytes = PdfReader.getStreamBytes(stream);
                        PRTokeniser tokenizer = new PRTokeniser(streamBytes);

                        StringBuffer strbufe = new StringBuffer();
                        while (tokenizer.nextToken()) {
                                if (tokenizer.getTokenType() == PRTokeniser.TK_STRING) {
                                        strbufe.append(tokenizer.getStringValue());
                                }
                        }
                        String test = strbufe.toString();
                        streamResult = new StreamResult("data.xml");
                        initXML();
                        process(test);
                        closeXML();
                        document.add(new Paragraph(".."));
                        document.close();
                } catch (Exception e) {
                }
        }

        public static void initXML() throws ParserConfigurationException,
                        TransformerConfigurationException, SAXException {
                SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory
                                .newInstance();

                handler = tf.newTransformerHandler();
                Transformer serializer = handler.getTransformer();
                serializer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
                serializer.setOutputProperty(
                                "{http://xml.apache.org/xslt}indent-amount", "4");
                serializer.setOutputProperty(OutputKeys.INDENT, "yes");
                handler.setResult(streamResult);
                handler.startDocument();
                atts = new AttributesImpl();
                handler.startElement("", "", "Roseindia", atts);
        }

        public static void process(String s) throws SAXException {
                String[] elements = s.split("\\|");
                atts.clear();
                handler.startElement("", "", "Message", atts);
                handler.characters(elements[0].toCharArray(), 0, elements[0].length());
                handler.endElement("", "", "Message");
        }

        public static void closeXML() throws SAXException {
                handler.endElement("", "", "Roseindia");
                handler.endDocument();
        }
}


September 27, 2012 at 1:22 PM


thanks for this!! It is converting the entire page into one tag.My requirement is that for every line in PDF I get one xml tag.



Related Tutorials/Questions & Answers:
Converting PDF in to XML
Converting PDF in to XML  I have to convert PDF into XMl without any loss in text. Please suggest sth good
converting html file into pdf - Struts
converting html file into pdf  i want to convert html file into pdf file using java code please help me
Advertisements
Converting XML to string
Converting XML to string  Hi Friends, I had an requirement in which a java program receives an xml messages from external web application such as (jsp, php) and then it converts the received xml message into a string. could
converting pdf to ps in mutiple languages - Development process
converting pdf to ps in mutiple languages  hi deepak, i want to convert pdf in to ps. These PDF are in hindi format,these pdf are converted...). actually am converting the hindi pdf into hindi ps manuaaly, but i want
Converting HTML to XML - Java Beginners
Converting HTML to XML  Hi, I am generating an HTML file (JSP from a Java Struts App.) I need to figure out a way to create an XML file from that HTML document. If you can refer me to some Java Code, that would
converting an xml file in relational database objects
converting an xml file in relational database objects  converting an xml file in relational database objects
pdf to xml conversion
pdf to xml conversion  i want to convert pdf file into xml file.. where i am having a table in pdf file with some headers and some data into it. i want the headers to be the tag of xml file. how can i do that using java? please
xml Converting to java using JDOM
xml Converting to java using JDOM  Hello , I am new to java and JDom so i make a Xml file and i need help to read it from java using objects , my... and getter , i dont know how to differentiate that this for example: line in xml
xml--certification pdf - Ajax
xml--certification pdf  hi roseindia sub: xml certification process and pdf's some test samples please send to me Thank You. regards...://en.wikipedia.org/wiki/XML_Certification_Program Thanks
Convert pdf to rtf and txt - XML
Convert pdf to rtf and txt  Can we convert PDF to RTF and TXT, and how?  Hi Friend, Try the following codes: 1)Convert PDF to RTF...) {} } } 2)Convert PDF to TEXT import java.io.*; import java.util.
Convert PDF to XML in Java
Convert PDF to XML File in Java In this Java tutorial section, you will learn how to convert pdf file to xml using java program. We have used itext api... for a transformation result in XML.After that Transformer class process XML from
converting html to ppt,pptx - Java Beginners
converting html to ppt,pptx   Hi, i convert html to .doc format.But i want to convert the html to ppt,xml,pptx,docx,pdf.. Is there any possibility to solve this problem
converting html to ppt,pptx - Java Beginners
converting html to ppt,pptx   Hi, i convert html to .doc format.But i want to convert the html to ppt,xml,pptx,docx,pdf.. Is there any possibility to solve this problem
converting one file format to another
converting one file format to another  Hi ser I need a code to export data from data grid to PDF and XL format plz help me out
Converting ISO8601-compliant String to java.util.Date
Converting ISO8601-compliant String to java.util.Date  Converting ISO8601-compliant String to java.util.Date
converting string to double in java
converting string to double in java  Please post an example to converting string to double in java. Thanks!   Convert String to Double Tutorial
PDF to Image
PDF to Image  Java code to convert PDF to Image
converting field to text sql
converting field to text sql  I wanted to convert the field to TEXT in SQL.. is it possible?   SQL - Converting the field to TEXT works SELECT CONVERT(TEXT,fld_name) FROM TABLE_NAME   SQL - Converting
converting html to ppt,pptx - Java Beginners
converting html to ppt,pptx   Hi, i convert html to .doc format.But i want to convert the html to ppt,xml,pptx,docx,pdf.. Is there any... possible to convert to .ppts,docx,pdf...   Thanks Brother, I tried
Converting a NSString into NSDate
Converting a NSString into NSDate  hello. How can I create a NSDate object out of it and then get different components like day, month, date, year from it. I have date string: Tuesday, October 19, 2010
Converting string to Int - JDBC
Converting string to Int  String nrate=request.getParameter("t3"); i want convert this nrate to integer to make calculations on it... so what should i do??? where t3 is a text area name and i am fetching...  Hi
converting decimal to base 32
converting decimal to base 32  procedure for converting decimal to base 32   CREATE PROCEDURE RTCONVERSION ( @valueToConvert int, @convertedValue varchar(20) out ) AS declare @counter int; declare @num int; declare @x
pdf generation.
pdf generation.  i want to generate the data which is stored in mysql data base in pdf format with php. how i will do
pdf to text
pdf to text  how to covert pdf file (which contain table and text) into word or excel file using itext api
pdf to database
pdf to database  Hi, I want to read the data from pdf(pdf file is having 50 fields) which is placed in database file and store that into MySQL database. I want this process untill the rows completed in the database file
PDF document
PDF document  hello, How to Open a PDF document on iPhone??   You can use these There's a whole toolkit built in which lets you render PDF pages to a UIView. Check out: CGPDFDocumentCreateWithURL
upload pdf
upload pdf   i want to dispal content of pdf fil and stored into database in human readable form using php . how can i do
How to get the position of the comments in the pdf
How to get the position of the comments in the pdf  how to get the postion of comments, in the extracted xml, from pdf
How to get the position of the comments in the pdf
How to get the position of the comments in the pdf  how to get the postion of comments, in the extracted xml, from pdf
PDF Comparator
PDF Comparator  Hi Guys, I need to develop a program which should compare a set of pdf files stored in one folder with a set of pdf files stored in another folder. Both folders should contain the same no. of pdf files with same
XML
XML  How i remove a tag from xml and update it in my xml
Advertisements
 

 

 

DMCA.com