We are implementing cross media publishing for one of our customers. They have a rather large catalogue which is already available online on a database driven website; however, the website does not have all data on it. Each product has some very technical table associated with it that are only available as PDF or in the offline catalogue.
The plan is to generate all catalogues from the same database. That means we need all the data that is currently in the offline catalogue in the database. The problem is that this data only exist as Quark XPress documents. Every number in the table has it's own layer and is carefully placed by hand. So, there is absolutely no structure in the documents.
Does anybody here have an idea how you could maybe have a program look at a page and see what looks like a table and then output that data in some structured form?