WordPress.org

Ready to get started?Download WordPress

Forums

Import of XML (2 posts)

  1. honyk
    Member
    Posted 5 years ago #

    I'm trying to convert several articles into XML format suitable for import into WP. I've found that import procedure can read only special formatted XML, which is impossible to achieve via XML->XML transformation using XSLT processor. Code requires some elements to be on separate lines (wp:category) while other elements vice-versa (content of content:encoded element must begin just after/before opening/closing tag). Moreover, each wp:category must be on separate line. If not, only the first one is read. XML import seems to be based on parsing lines (waits exact structure) instead of reading real content of XML tree...
    If in my XSLT template instruction <xsl:otput indent="yes"/> is used, I can import all content easily, but I lose category hierarchy. If XML output is not indented, I can import (but some workaround must be applied) articles into proper category, but content of articles is empty.
    Is there any plan to improve import filter to be able to read really content of elements without dependence on linebreaks?

  2. honyk
    Member
    Posted 5 years ago #

    Finally I use <xsl:otput indent="yes"/> and resulted XML file is "normalized" using VB Script/regExp, which removes line breaks inside wp:category and content:encoded elements.

    Set WshShell = WScript.CreateObject("WScript.Shell")
    Set fso = WScript.CreateObject("Scripting.FileSystemObject")
    
    If WScript.Arguments.count < 2 Then
       WScript.echo "No value given for one or more required parameters!"
       WScript.Quit
    End If
    
    inputFilePath = WScript.Arguments(0)
    outputFilePath = WScript.Arguments(1)
    
    Set inputFileStream = fso.OpenTextFile(inputFilePath, 1)
    input = inputFileStream.ReadAll
    inputFileStream.Close
    Set inputFileStream = Nothing
    
    input = stripElements(input, "wp:category")
    input = stripElements(input, "content:encoded")
    
    Set outputFileStream = fso.OpenTextFile(outputFilePath, 2, True)
    outputFileStream.Write input
    outputFileStream.Close
    Set outputFileStream = Nothing
    
    function stripElements(input, tag)
      Set re = New RegExp
      re.Global = True
      re.Pattern = "<(" + tag + ")>([\s\S]*?)</\1>"
      Set matches = re.Execute(input)
      Set re = Nothing
      If matches.Count > 0 Then
         For i = 0 To matches.Count-1
            temp = stripSpace(matches(i))
            input = replace(input, matches(i), temp)
         Next
      End If
      stripElements = input
    End Function
    
    Function stripSpace(input)
      Set re = New RegExp
      re.Global = True
      re.Pattern = ">([\s]*?)<"
      stripSpace = re.Replace(input, "><")
      set re = Nothing
    End Function

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags