Print PDF only with specified pages

Having problems with building or using CEF's C/C++ APIs? This forum is here to help. Please do not post bug reports or feature requests here.

Print PDF only with specified pages

Postby crayze » Mon Aug 06, 2018 9:16 am

Hi, my goal is to print PDF only with pages that contains specified DOM elements, in other words I want to remove(not print) PDF pages that don't have wanted elements. Lets say wanted DOM elements are known by some html class.

I know, this may be hard to do, but I started from reading the code. I can see that the interesting files are:
cef/libcef/browser/printing/*
components/printing/renderer/print_render_frame_helper.cc (print_web_view_helper.cc in my older 2017 branch)
So I have base overview how rendering PDF(making document preview) works in CEF/Chromium.

Well, setting which pages I want to render is easy, there is std::vector<int> with page numbers to print.

But I'm still thinking how to predict/compute pdf page number for DOMElement. Printing runs in custom layout: viewport, dpi/scale.

My current idea is: evalue javascript which test elements positions beetwen blink::WebLocalFrame->printBegin() and blink::WebLocalFrame->printEnd() calls.
As I guess printBegin() should change WebKit frame document layout for printing mode, if It is possible to succesfully evaluate javascript , getBoundingClientRect() should return to me positions in printing layout, am I right?
If javascript is not possible maybe I could iterate over all elements of blink::WebLocalFrame document and call native equivalent of getBoundingClientRect() to get elements current position.

Maybe someone else have simpler idea?
crayze
Techie
 
Posts: 12
Joined: Wed Jun 07, 2017 9:21 am

Re: Print PDF only with specified pages

Postby HarmlessDave » Mon Aug 06, 2018 11:55 am

Other approaches to consider if you control the server(s):

Some websites open up a new window to display things like receipts, showing a view of the data designed for printing. Other sites generate PDF files server-side using libraries for PHP or whatever, then return that PDF file.
HarmlessDave
Expert
 
Posts: 370
Joined: Fri Jul 11, 2014 2:02 pm

Re: Print PDF only with specified pages

Postby crayze » Tue Aug 07, 2018 3:57 am

HarmlessDave wrote:Other approaches to consider if you control the server(s):
Some websites open up a new window to display things like receipts, showing a view of the data designed for printing. Other sites generate PDF files server-side using libraries for PHP or whatever, then return that PDF file.

Thanks but it's not possible in my case. Our software loads archived websites saved from internet :) so its not so easy.
Those sites have content that may interests our clients - technically some DOM elements with text nodes are marked and they can be easy find by simple javascript script.
One of the features of our software is print that site to PDF (presentation for clients), typically archived single webpage after PDF printing has a few, sometimes even over a dozen A4 pages, where interested content is usually only on one of them. So the client gets many pages but the interesting is only one and rest of them are rubbish.
The goal is to print PDF without rubbish pages, this would improve quality of our PDF printing service.
So you can see the task is not so trivial :)


It's possible to do without cef/chromium modification. It can be made by printing to PDF twice, once with normal text color and the second with different color (maybe background?) of the text in wanted elements (style="color: inverted").
Then after convert all PDF pages to bitmaps I can be able to analyze pixel by pixel and find the pages with changed text color pixels.
But this solution has big performance overhead, analyzing huge bitmaps pixel by pixel is lame. I believe it can be made better.
crayze
Techie
 
Posts: 12
Joined: Wed Jun 07, 2017 9:21 am

Re: Print PDF only with specified pages

Postby HarmlessDave » Tue Aug 07, 2018 2:35 pm

What about injecting your own JavaScript to delete unwanted DIVs from the page before printing?

Or scraping just the parts you want and building a new page in memory to render in a new tab? That might not work well if you need the page to contain includes and scripting from the original page, but should work for simple HTML.
HarmlessDave
Expert
 
Posts: 370
Joined: Fri Jul 11, 2014 2:02 pm

Re: Print PDF only with specified pages

Postby crayze » Wed Aug 08, 2018 8:55 am

HarmlessDave wrote:What about injecting your own JavaScript to delete unwanted DIVs from the page before printing?

Or scraping just the parts you want and building a new page in memory to render in a new tab? That might not work well if you need the page to contain includes and scripting from the original page, but should work for simple HTML.

Deleteing DIVs (or making them display=none, whatever) often destroys page layout and page looks horrible.
By the way - printing text selection works almost the same, rendering starts from selected nodes (contains only selected nodes), parent nodes of rest document are ignored - and this gives of course ugly results as I already wrote.

Setting visibility: hidden is better, it won't change elements positions at all, but then PDF still contains same number of pages, most of them will be blank(or only with background), and we still don't know which contains wanted elements without some graphical post-processing (this is how worked our printing service in tests).

My goal is that page with wanted elements must looks the same as rendered normally(with all pages) - their layout must look as in original.
crayze
Techie
 
Posts: 12
Joined: Wed Jun 07, 2017 9:21 am


Return to Support Forum

Who is online

Users browsing this forum: Google [Bot] and 92 guests