Import of XML (2 posts)

  1. honyk
    Posted 7 years ago #

    I'm trying to convert several articles into XML format suitable for import into WP. I've found that import procedure can read only special formatted XML, which is impossible to achieve via XML->XML transformation using XSLT processor. Code requires some elements to be on separate lines (wp:category) while other elements vice-versa (content of content:encoded element must begin just after/before opening/closing tag). Moreover, each wp:category must be on separate line. If not, only the first one is read. XML import seems to be based on parsing lines (waits exact structure) instead of reading real content of XML tree...
    If in my XSLT template instruction <xsl:otput indent="yes"/> is used, I can import all content easily, but I lose category hierarchy. If XML output is not indented, I can import (but some workaround must be applied) articles into proper category, but content of articles is empty.
    Is there any plan to improve import filter to be able to read really content of elements without dependence on linebreaks?

  2. honyk
    Posted 7 years ago #

    Finally I use <xsl:otput indent="yes"/> and resulted XML file is "normalized" using VB Script/regExp, which removes line breaks inside wp:category and content:encoded elements.

    Set WshShell = WScript.CreateObject("WScript.Shell")
    Set fso = WScript.CreateObject("Scripting.FileSystemObject")
    If WScript.Arguments.count < 2 Then
       WScript.echo "No value given for one or more required parameters!"
    End If
    inputFilePath = WScript.Arguments(0)
    outputFilePath = WScript.Arguments(1)
    Set inputFileStream = fso.OpenTextFile(inputFilePath, 1)
    input = inputFileStream.ReadAll
    Set inputFileStream = Nothing
    input = stripElements(input, "wp:category")
    input = stripElements(input, "content:encoded")
    Set outputFileStream = fso.OpenTextFile(outputFilePath, 2, True)
    outputFileStream.Write input
    Set outputFileStream = Nothing
    function stripElements(input, tag)
      Set re = New RegExp
      re.Global = True
      re.Pattern = "<(" + tag + ")>([\s\S]*?)</\1>"
      Set matches = re.Execute(input)
      Set re = Nothing
      If matches.Count > 0 Then
         For i = 0 To matches.Count-1
            temp = stripSpace(matches(i))
            input = replace(input, matches(i), temp)
      End If
      stripElements = input
    End Function
    Function stripSpace(input)
      Set re = New RegExp
      re.Global = True
      re.Pattern = ">([\s]*?)<"
      stripSpace = re.Replace(input, "><")
      set re = Nothing
    End Function

Topic Closed

This topic has been closed to new replies.

About this Topic