Thank you for your kind words; positive feedback is a great motivator to keep working on the plugin and supporting its users.
I have found that PDF meta data is more uneven than that found in images. A lot depends on the application that created the document.
It would be very helpful if you could post a link to one or a few of your documents, so I can examine them and see what’s available for you to work with. If you would rather send them by e-mail you can go to the Contact Us page and give me your contact information. I will send you my e-mail address:
Fair Trade Judaica/Contact Us
As you’ve found, the ALL_PDF
pseudo variable can be helpful. An easy to use it for inspecting your documents is to create a post or page and add this [mla_gallery]
shortcode to display the information:
<h3>ALL_PDF</h3>
[mla_gallery post_mime_type='application/pdf' post_parent=all size=none mla_caption='{+base_file+}<br>{+pdf:ALL_PDF+}' columns=2]
With this approach you can avoid the effort to create a custom field and mapping rule. Let me know if that helps, and consider posting or sending me some of the results.
I am confident we can get this working for you, if your documents contain the information you want. I look forward to hearing from you here or by e-mail.
Thread Starter
Knut23
(@knut23)
Hi David!
Thanks!!! I have sent you an email with links to the pdf’s.
Regards,
Richard
Thank you for following up with your contact information and a link to the documents you are working with. Here is a summary of my e-mail response to you:
The MLA parsing code tries to populate the following fields from a variety of sources:
/*
* Try to populate all the PDF-standard keys (except Trapped)
* Title - The document's title
* Author - The name of the person who created the document
* Subject - The subject of the document
* Keywords - Keywords associated with the document
* Creator - the name of the conforming product that created the original document
* Producer - the name of the conforming product that converted it to PDF
* CreationDate - The date and time the document was created
* ModDate - The date and time the document was most recently modified
*/
You can find more information about this in the “Metadata in PDF documents” section of the Settings/Media Library Assistant Documentation tab. I suggest you use those fields as a starting point for mapping the data. Here are the IPTC/EXIF Mapping Rules I came up with:
Title: template:([+pdf:Title+])
Caption: template:([+pdf:Subject+])
Att. Categories: template:([+pdf:Keywords,array+])
The three rules have a similar structure:
- “template:” (goes in the text box below “EXIF/Template Value”) is used to access the pdf: values instead of the EXIF values.
- The values are surrounded by parentheses “(” and “)” so they will return an empty string for documents without meta data in the field and for other items such as images.
- I have selected “Replace” to overwrite the existing text, because a default Title was assigned to the items when they were uploaded. You can change this to “Keep” if you already have values in one or more of the fields that you want to retain.
The taxonomy rule also has the “,array” option to return multiple keywords as individual array elements that can be converted to taxonomy terms.
I have also checked the “Enable IPTC/EXIF Mapping when adding new media” box so the rules are automatically applied when new items are uploaded.
After you enter and save the rules you can test them out on single documents by clicking the “Map IPTC/EXIF Metadata” link in the “Save” meta box on the Media/Edit Media screen. You can also use the Media/Assistant Bulk Edit area to experiment on several items at once. Of course, you can also use the “Map All … ” buttons in the IPTC/EXIF tab if you are feeling brave/lucky.
I hope that my response got you the results you wanted. I am marking this topic resolved, but please update it if you have any problems or further questions regarding the mapping of PDF meta data to WordPress fields and taxonomies. Thank you for your interest in the plugin.