There is an untold amount of scientific data in the millions of reports and scientific studies over the past few centuries. While much is digitized into easy-to-use PDF formats, the information inside may still be locked away in such a fashion where it isn't as easy as it could be to work with it in an aggregate manner. Our recently released PDF Extract API provides a powerful way to get the raw text of a document and intelligently understand the content within. In this article, I'm going to demonstrate how we can use this API to gather and aggregate a large set of data into a unified whole.
Our Data
For this hypothetical example, I'm imagining an astronomy organization (aptly named Department of Star Light) that examines the luminosity of stars. Before I say anything more, please note I am not an astronomer nor a scientist. Please remember this is all a hypothetical example. Anyway, the DSL (it's a government agency so of course, it goes by an acronym) studies the brightness of a set of stars. Every year, it creates a multipage report on these stars. The report contains a cover page and then twelve pages of tables representing each month of the year.
Source : https://dzone.com/articles/digging-out-data-with-adobe-pdf-extract-api
No comments:
Post a Comment