Introduction
Hi, I’m Armin from SGDC, Sansan’s Cebu office, and I work in the Purchasing Area of Bill One. Almost a year ago, our team developed the highlighting feature for Bill One’s invoice matching. This feature has since become one of the core functions of Purchase Order Matching in Bill One.
Why we built it
Matching purchase orders (POs) with invoices is a necessary step in many workflows. The process often slows people down, especially when dealing with scanned PDF documents. Scanned PDFs contain no searchable text, making verification tedious and error-prone. Users are forced to scroll, zoom, and check line by line to confirm values are correct. Our goal was to make this process faster, simpler, and more reliable for our customers.
How it works
The highlighting feature is powered by several components working together in sequence. OCR and data preparation run in the background when an invoice is uploaded or when a PO is updated. When the user clicks the highlight button on a PO line, the system shows the highlight in the PDF.
Converting PDFs into images with Poppler
Invoices can arrive as clean PDFs, scanned copies, or low-quality files from many different systems. To process all of these consistently, we first convert each page of the invoice into an image. This is done with Poppler, an open-source PDF rendering library widely used for PDF processing. In our case, Poppler is a helper tool with one focused role: turning PDF pages into images. By standardizing invoices as images, we remove differences between text-based and scanned PDFs. This ensures our OCR engine always receives a reliable and uniform input for analysis.
OCR analysis by Sansan’s R&D unit
The image pages and the purchase order data are sent to a service developed by Sansan’s R&D team. This service uses OCR technology and performs the following steps:
- Text recognition: Detects characters in the image and converts them into machine-readable text.
- Coordinate mapping: Records bounding boxes that mark the exact position of recognized text.
- Comparison with PO data: Matches detected values against fields in the purchase order.
The output is structured data that contains both the recognized strings and their coordinates. This structure allows the system to link PO values to their exact position inside the invoice PDF.
Transformation in Bill One
The OCR output alone cannot be plotted directly on the original PDF. There are two challenges we handle in the transformation layer:
Rotation handling
Invoices don’t always arrive upright. Some pages are rotated 90° or 180°. If we used raw OCR coordinates, the highlights would appear in the wrong spots.
The transformation layer detects and corrects for page rotation before moving forward.
DPI conversion
OCR runs on images rendered at 200 DPI during the PDF-to-image conversion process. PDFs, on the other hand, use a coordinate system defined at 72 points per inch by default. Without converting between the two, highlights would not align with the text in the PDF. We fix this by rescaling the OCR coordinates with the formula below:
xyCoordinates * (72 / 200)
For example, if OCR detects text at (400, 600) in 200 DPI it maps to (144, 216) in 72 PPI. This calculation converts OCR image coordinates into the PDF coordinate system accurately. It ensures highlight rectangles always align perfectly with the text inside the invoice PDF.
Once rotation and DPI adjustments are complete, the transformation layer repackages the data. The data is then passed in a format that the PDF viewer can read and use for rendering highlights.
Interactive highlighting in the PDF Viewer
The frontend brings the processed results to life for the user. When the highlight button on a PO line is clicked, the system uses the transformed coordinates. It then displays an annotation rectangle in the invoice PDF at the exact matching position. The annotation rectangle blinks briefly to make the highlighted value stand out clearly. The viewer also auto-scrolls to the highlighted region for easier navigation. Even if the value is deep inside a long invoice, the user is taken straight to the right spot.
Here’s what it looks like in action:

What’s next
The highlighting feature is only the beginning of our work on PO and invoice matching. We are now building PO Matching V2, which adds smarter and more detailed analysis. Instead of only checking single fields, V2 also looks at full invoice line tables. That means quantities, unit prices, and amounts can be itemized and matched with PO data. With this, customers can verify entire line items, not just totals or single fields. PO Matching V2 is still in development, but it marks a big step toward automation in Bill One.