Digging Out Data With Adobe PDF Extract API - MrLiambi's blog


My tweets


Thursday, 12 August 2021

Digging Out Data With Adobe PDF Extract API

There is an untold amount of scientific data in the millions of reports and scientific studies over the past few centuries. While much is digitized into easy-to-use PDF formats, the information inside may still be locked away in such a fashion where it isn't as easy as it could be to work with it in an aggregate manner. Our recently released PDF Extract API provides a powerful way to get the raw text of a document and intelligently understand the content within. In this article, I'm going to demonstrate how we can use this API to gather and aggregate a large set of data into a unified whole.

Our Data

For this hypothetical example, I'm imagining an astronomy organization (aptly named Department of Star Light) that examines the luminosity of stars. Before I say anything more, please note I am not an astronomer nor a scientist. Please remember this is all a hypothetical example. Anyway, the DSL (it's a government agency so of course, it goes by an acronym) studies the brightness of a set of stars. Every year, it creates a multipage report on these stars. The report contains a cover page and then twelve pages of tables representing each month of the year.

Source : https://dzone.com/articles/digging-out-data-with-adobe-pdf-extract-api

No comments:

Post a Comment