WolkTools Dry Run Importer

Description

Dry Run Importer is a safe-loop companion for working with WXR (WordPress eXtended RSS) files — built for the workflow of importing and updating WordPress content repeatedly and safely, including content you edited with the help of AI.

Unlike the standard WordPress importer or WP All Import, this tool supports the development and content-migration process itself. It lets you overwrite and merge content into an existing WordPress site safely, with a strong safety model that avoids silently breaking the site.

Dual tool: Web Checker + WP Plugin

The project ships as two complementary tools: a WP Plugin you install inside WordPress, and a Web Checker (wp-import-checker.html) that runs fully offline in your browser.
* Web Checker (XML test & CSV check): Before you import anything into WordPress, test the integrity of a WXR XML file and compare it against a source CSV (key-column mapping). It parses and validates even tens of thousands of XML rows quickly in your local browser.
* WP Plugin (this plugin): Built on the same validation logic as the Web Checker, it applies data to a real site from inside the WordPress admin with strong safety: Dry Run (validation), diff preview, staged Chunk Import, automatic per-post backups before import, restore, and concurrent-import lock control.

Key features

  • Dry Run (pre-flight) with a 3-color verdict (Green, Yellow, Red): Before writing any data, it thoroughly inspects XML structure, BOM, duplicate posts, missing taxonomies and parent/child hierarchy links, and the state of custom fields (ACF, etc.).
  • Visual diff preview: Clearly visualizes the difference between an existing post and the incoming data per post-core field, meta, and taxonomy, so you can check off and overwrite-merge only the items you want to update.
  • Per-post backup and safe Restore: Each post that will be updated is automatically backed up individually, capturing its state just before import. If the result is not what you intended, you can restore it to its pre-import state with one click from the admin screen. Before restoring, a “restore impact preview” is shown and the action proceeds only after explicit consent (a confirmation checkbox).
  • Staged import (Chunk Import): Even large WXR files are split into chunks and processed sequentially without causing timeouts. If an error occurs partway through, you can safely resume from the point of failure.
  • Concurrent-import control (GET_LOCK): Uses MySQL GET_LOCK for a robust lock so multiple operators — or an AI script and a manual operation — do not import at the same time and collide.
  • utf8mb4 / emoji safe: 4-byte UTF-8 strings, emoji, and surrogate pairs are verified to import and export intact, without mojibake or data loss.
  • Images don’t break (media relink & migration guard): Images you uploaded to the Media Library first are matched against the WXR attachments by filename (plus dimensions) and reused, remapping featured images, ACF image fields, and in-content references to your existing media (no empty duplicates). When migrating from another site and the library is missing an image, a safety gate stops the import and prompts you to “add the images first.”
  • Dry Run Exporter (safe export): An advanced export feature is also integrated, with detailed filtering by post type, date, status, author, taxonomy and meta custom queries, personal-data (PII) warnings, and automatic collection of related media.

Use cases

  1. Pushing a WXR diff from staging to production
    While production keeps running, safely push only the diff data — new posts or custom fields built on staging — into production, validating up front that no collisions occur before merging.
  2. Staged content migration during a site renewal
    When migrating thousands of old posts and images, prevent timeouts and silent data drop, and migrate progressively while preserving untouched content and media parent/child relationships.
  3. Selective restore of specific content from a backup WXR
    Without rolling the whole site back to an old snapshot, pick only specific broken posts, pages, or taxonomies from a WXR file and import/restore them individually.
  4. Agencies safely updating a client environment from WXR before delivery
    Without taking custody of the client’s WordPress site, output content (WXR) created in a local or test environment for delivery, visually prove that no collision or overwrite occurs against existing data, and then apply it.
  5. Temporary state preservation and rollback before theme or plugin changes
    Before a large theme change or page-builder migration, export just the affected content once; if a problem appears after applying, roll back individual posts from their per-post backups immediately to the original clean state.

Important Limitations

Please review the following limitations. The plugin takes an automatic per-post backup before import, but for first-time use on production we still recommend taking a full external database backup as a precaution.

  • Author merge: When author mapping information exists in the WXR, authors are safely linked to existing WordPress users (no silent drop as with the standard importer). However, the feature that auto-creates new users defined in the WXR is currently disabled.
  • Comment restoration: Comments and trackbacks associated with a post are safely restored, but for very large volumes (tens of thousands of comments) adjust your memory settings beforehand.
  • Unregistered custom post types/taxonomies: The source custom post types and taxonomies must already be registered (by a plugin or theme) and active on the destination site.
  • Multisite: Operation on WordPress Multisite has not yet been fully verified.
  • Restore scope: The plugin’s Restore feature returns the target post’s core fields, meta, and taxonomies to their original state. Newly downloaded attachment (image) files themselves are not auto-deleted, to avoid bloating the Media Library. Also note that running a restore replaces ALL postmeta of the target post with the values from the backup. If other plugins added metadata after the backup was taken, that metadata will be removed, so please be careful.
  • Object cache / search index: Because the plugin writes raw meta via direct unserialize (hardened with the secure allowed_classes => false), some dynamic meta-add hooks may be bypassed. If you use an external persistent object cache (Redis/Memcached) or a search index such as Elasticsearch, run a cache clear or reindex once after import completes (the wpsi_after_raw_meta_update action hook is provided for integrators).
  • Image IDs inside Gutenberg blocks: Remapping attachment IDs inside core media blocks (image, gallery, etc.) is supported, but ID remapping inside the complex proprietary JSON of third-party page builders (Elementor, etc.) may be limited.
  • Huge WXR over 100MB: WXR files larger than 100MB are not guaranteed to work. Because parsing uses the browser’s DOMParser, split the file beforehand to avoid freezing the browser tab or exhausting memory.

Supported Browsers

The Dry Run Importer admin UI is optimized for the latest two versions of Chrome, Edge, Firefox, and Safari. Older browsers such as Internet Explorer are not supported.

Screenshots

Installation

  1. Copy the wolktools-dry-run-importer directory into wp-content/plugins/ as-is (or install the ZIP from “Plugins > Add New > Upload Plugin” in the admin).
  2. Activate “Dry Run Importer” from “Plugins” in the WordPress admin.
  3. Open “Tools” > “Dry Run Importer” in the admin.
  4. Drag and drop your WXR XML file, and first run a “Dry Run” (validation).
  5. To use the export feature, open “Tools” > “Dry Run Exporter”, set your filters, confirm the count and size under “Preview”, and then download.

FAQ

Is it really safe to use in production?

The plugin is designed around a safety model: it always takes an automatic backup before import, automatically blocks dangerous content, and lets you restore at any time. That said, the first time you use it on production, we recommend first trying a “Dry Run” on a test/staging environment, reviewing the warning messages, and securing a full external DB/file backup before use.

Does running a Restore also delete downloaded image files?

No. Restore only returns the post’s text, metadata, and taxonomy relationships to the backup point. To protect server disk space, image files already saved to the Media Library are not deleted.

What happens to my data when I uninstall the plugin?

When you uninstall, the temporary task information, per-post backup logs, and processing-log tables that the plugin created are safely and completely removed from the database, and the wpsi_* settings options are erased as well. Your normal posts and Media Library images are never deleted.

Which matching key should I choose for collision detection?

We most recommend matching by “Slug.” Matching by “Title” risks accidentally overwriting a different article that happens to share the same title. Use matching by “ID” only when you are certain the post IDs intentionally coincide between source and destination (for example, syncing staging to production).

How should I choose between the Web Checker and the WP Plugin?

If WordPress isn’t installed yet, or you want to check WXR structure or reconcile against a CSV without affecting a running WordPress site, use the “Web Checker” — a single self-contained HTML page that runs fully offline in your browser. It is distributed separately from this plugin (it is not included in the plugin ZIP); you can find the download link on the plugin page. When you want to actually bring data into WordPress safely and use the per-post backup and restore features, use the “WP Plugin.”

Can images be downloaded automatically from an external site?

Yes. However, image download makes external HTTP requests and creates new attachments in the destination Media Library, so run it deliberately after confirming the liveness and safety of the image URLs (including an SSRF-prevention filter) in the Dry Run.

If I upload images manually, will they be linked to the WXR? (recommended flow)

Yes. This is the most reliable migration procedure.

  1. Upload the images manually to the destination Media Library (drag and drop in the admin, etc.). That alone lets WordPress correctly handle the storage path, thumbnails, and ownership/permissions (no need to copy the uploads folder structure or chown on the server).
  2. Import the WXR. With the advanced option “Link to existing media by filename (don’t create duplicates)” (ON by default), each attachment is matched against existing media by filename (plus image-dimension matching) and linked to the existing ID without creating duplicates. Featured images, ACF image fields, and in-content image references are remapped to that existing media.
  3. The pre-flight before import shows “Linked to existing media: N / same filename in multiple places create new / dimension mismatch create new,” so you always confirm before importing.

To prevent mismatches, four guardrails apply: (a) exact filename match only, (b) if the same filename exists in multiple files, no auto-link — create new, (c) if the WXR carries image dimensions, compare against the existing file and skip linking on mismatch. Turn it OFF to create everything new as before. The filter wpsi_attachment_relink_policy (strict / filename-only / default) changes the matching strictness.

How does the Dry Run Exporter work?

Dry Run Exporter is based on WordPress core’s export_wp() and generates a safely server-side-filtered WXR XML file on the fly. Before exporting, always click “Preview” to confirm the number of posts and attachments to be exported, the estimated file size, and PII warnings such as author email addresses.

The Attachment mode (how related media is exported) offers three behaviors: “auto” (auto-collect only media related to the selected posts), “include” (include all site media), and “exclude” (exclude media).

Note: WXR XML files contain author email addresses and display names. Handle export files with appropriate care for personal data.

Reviews

There are no reviews for this plugin.

Contributors & Developers

“WolkTools Dry Run Importer” is open source software. The following people have contributed to this plugin.

Contributors

Changelog

1.0.0

  • First stable release. Plugin renamed to Dry Run Importer (companion exporter: Dry Run Exporter).
  • Migration media safety gate: when the file comes from another site (URL mismatch) and attachments have no matching file in the Media Library, the import button is blocked with a clear warning to upload the images first — preventing empty/broken attachments. Explicit consent checkbox to override.
  • Verdict now downgrades a green result to yellow while the media safety gate is active, so the top banner never says “safe” while images would break.
  • Exporter: post-status filter defaults to “Published only” (attachments and ACF/structure definitions are still collected via a status-agnostic path); internal statuses (e.g. acf-disabled) no longer leak into the status checkboxes.
  • UI cleanup: removed the demo loader, hid the optional Sheet Data Match tab by default (re-enable via wpsi_enable_csv_match), removed the redundant old/new URL fields from Advanced Settings, and removed the on-screen version badge. Unified importer/exporter heading sizes.
  • Attachment filename relink (default ON): match WXR attachments to media you already uploaded by filename + dimension check, reuse instead of creating duplicates, and remap featured images / ACF image fields / content references to the existing media. Guardrails: exact basename only, ambiguous (same filename in 2+ files) skipped, dimension/size mismatch skipped, shown in preflight. Filter wpsi_attachment_relink_policy.
  • Bulk restore: restore every backup in a task in one dependency-safe pass (attachments parent posts) from the History tab.
  • Exporter now emits <wp:post_mime_type> for attachments so imported media renders without guessing the mime type.
  • Importer backfills attachment mime type on update; thumbnail regeneration caps chunk size to stay under execution limits.

0.9.0-beta

  • Big-4 regression fixes vs WP standard importer: hierarchical term parents, post_author, comments / pingbacks, attachment.post_parent.
  • Restore impact preview + consent gate before destructive restore.
  • MySQL GET_LOCK concurrent-import protection.
  • Versioned DB migration runner with idempotent dbDelta convergence.
  • utf8mb4 / 4-byte UTF-8 round-trip verified (emoji, surrogate pairs, multibyte content).
  • Fault-injection smoke (chunk-mid-crash + resume) wired into CI.
  • WP-Cron daily log purge (default 30 days, filterable).
  • Restore impact preview API and UI consent gate.
  • Distribution hardening: .distignore + release.yml zip assertion excludes PoC, tests, history, and dev configs.
  • Plugin Header finalized for beta distribution (GPLv2 license).

0.2.0

  • New: Dry Run Exporter — filtered WXR export under Tools > Dry Run Exporter.
  • Filter exports by post type, date range, post status, author, taxonomy terms, and meta query.
  • Three attachment modes: auto-collect related, include all, or exclude all.
  • Preview before export shows post count, attachment count, estimated size, and PII warnings.
  • REST API endpoints: /export-preview, /export, and /export-filters.
  • Japanese translation for all exporter strings.

0.1.0

  • Initial experimental implementation.
  • Dry Run, diff preview, selective import, post backup/restore, media checks, media downloads, and initial media remapping.
  • Optional daily auto purge for backup rows older than the configured retention window (opt-in toggle in the admin screen).
  • New wpsi_after_raw_meta_update action hook fires after raw post meta writes for integrators (ACF, search index, persistent object cache).
  • WordPress privacy exporter/eraser integration for backup rows containing a requested email address.
  • Bundled demo WXR and first-run guidance in the admin screen.