WordPress.org

Ready to get started?Download WordPress

Forums

Importing MT-like data problem (25 posts)

  1. Lorelle
    Member
    Posted 9 years ago #

    I'm working on importing stuff from my html pages into WordPress via the MT import script and I seriously need help at the forum discussion post here.

    If anyone has any experience with importing into the WP database tables or converting data to the MT import layout for importing through WP, PLEASE help me.

    Thanks,

    Lorelle

  2. Kitten
    Member
    Posted 9 years ago #

    Ok,

    1. You can't import HTML pages into a database as records. It's not a field delimited format. Really. It isn't.

    2. Don't try and munge it into MT format first. Why? Because it's not necessary.

    3. Convert the data into WP compatable SQL statements.

    3a. How do you find out what WP compatable SQL statements are? Easy. Export your wp_posts table and look at how it's formatted. (Don't use 'extended inserts' format, it's less readable.)

    4. Then import those SQL statements into the database via phpmyadmin or the mysql command line.

    5. Enjoy.

  3. Lorelle
    Member
    Posted 9 years ago #

    Thanks for the reply, Kitten.

    I have HTML data in the fields, not whole HTML to be imported.

    I've exported the wp-posts table and examined it thoroughly. Is there a way to get around the specific order of the table or can I put in any "hints" that says "this is this so ignore the order?" Make sense?

    I can go through and change the order of the import file, but we're talking tedious stuff. Munging the stuff into MT format seemed best because the import script didn't seem too fussy over the order of the information as long as the "title" was there like "TITLE: blah blah" and BODY: "blah blah blah".

    I'm been at this for three weeks, so any help deserves hugs and flowers.

  4. Kitten
    Member
    Posted 9 years ago #

    Databases aren't mind readers, you have to tell them what is where. That's done in the insert statements.

    INSERT INTO wp_posts ( foo, bar, baz, etc...) VALUES ( one, two, three, etc... );

    So that foo is one, bar is two, and baz is three.

    Now you just have to have valid WP field names in the first set of ( ) (the 'where') and the post info in the second set, (the 'what'). As long as the what is mapped to the where, and the where exists in your table, it'll import and be where you want it.

  5. Lorelle
    Member
    Posted 9 years ago #

    Brilliant. I forgot I can be specific within the INSERT...after weeks of this, the mind is fried.

    Thanks,

    Lorelle

  6. Lorelle
    Member
    Posted 9 years ago #

    Ah, now I remember why this didn't work before. Remember, I have HTML data with quotes around attributes. Using INSERT, I am limited to using field separators of "," and a line break as the end of the record.

    Using LOAD DATA INFILE I can establish distinctive separators with the FIELDS TERMINATED BY... etc.

    Any way of combining the two so I have something that says INSERT X, Y, Z... using FIELDS TERMINATED BY.... INTO wp_posts....

    EDIT: I guess I can set it up to be concurrent with every field in there with nothing in it in order to use the LOAD, but...is there a combo?

  7. Kitten
    Member
    Posted 9 years ago #

    >Ah, now I remember why this didn't work before.
    >Remember, I have HTML data with quotes around attributes.

    Why isn't your data properly escaped? Then it'd be properly inserted into the database.

    Make a post in WP with some HTML in it, dump it & look at it. Then you'll see what the format should be.

  8. Lorelle
    Member
    Posted 9 years ago #

    >>>Why isn't your data properly escaped?<<<

    I'm not sure what you mean by escaped. And I guess I'm not explaining myself. I did a dump from the wp-table and examined it. I can do a search and replace to insert all the appropriate separators.

    The problem is that the data isn't in the right order. So I can't do an INSERT and am stuck doing a LOAD. LOAD will not allow me to be specific with the field names like INSERT does, so I would have to manually go through the data and change what goes where.

    I can manually go through the data and change the order of the information, but we are talking over 500 articles that aren't short blogs. I can also blow out all the title, author name, excerpts, etc., and just import the data straight in, and then I will have to go through every thing in the database to add in the other information. More manual labor.

    If it weren't for the quotes around the html attributes, I could very easily import this into excel or something and then realign the order. I might be able to do this in WordPerfect by creating a merge file and then merging it into a table and then realigning the table and getting it back into a format that will work with the INSERT. This is a lot of work, but if it is the only choice, I'll do it.

    I'm trying to work with what I have to avoid the manual labor. If I can't, then a lot of other people who are trying to do what I'm doing, with a lot more data than I have, need to know this information, too.

    Since it never began in a database, I'm trying to format it in a fashion to get it into the database. I really feel like I'm a pioneer in this, but I can't be. Any and all help is appreciated, Kitten.

  9. Anonymous
    Member
    Posted 9 years ago #

    I think what Kitten means by 'escaped' is inserting backslashes before the quotes; you could do that with find-replace in any decent text-editor.

  10. Lorelle
    Member
    Posted 9 years ago #

    Thanks. If it is that simple, then this should be really simple to import. I hope it is as simple as that.

  11. Kitten
    Member
    Posted 9 years ago #

    Non-escaped data: this is "something" that screws up my "importing"

    Escaped data: this is \"something\" that screws up my \"importing\"

    the second you can stick between double quotes and import it just fine.

  12. Lorelle
    Member
    Posted 9 years ago #

    I'm working on a write up for the codex on all of this, so I need to have some clarity.

    The proper form for using the INSERT with the specific fields listed with the \" acting like an escape for all the quote marks would look like this example:

    INSERT INTO wp-posts (post_author, post_date, post_content, post_title, post_excerpt)
    VALUES ("1", "2005-1-14", "<p class="red">Something in \"red\" here.

    <p id=\"fred\">blah blah</p>","Post Title", "This is the excerpt blah blah")

    Am I even on the right track?

  13. Kitten
    Member
    Posted 9 years ago #

    Close, but your 3rd field would end right before the letter 'r' (after the '=') since that quote is not escaped.

    More like:

    INSERT INTO wp-posts (post_author, post_date, post_content, post_title, post_excerpt)
    VALUES ("1", "2005-1-14", "<p class=\"red\">Something in \"red\" here.
    <p id=\"fred\">blah blah</p>","Post Title", "This is the excerpt blah blah")

    Notice that the quotes enclose all the data, anytime a quote appears in the data it needs to be escaped. Here's a trick to finding out if your data's correctly formatted:

    Import it.

    If you get errors, note the line number, then go look at it, fix it. Wash, rinse, repeat, until it imports without error.

  14. Lorelle
    Member
    Posted 9 years ago #

    I worked on the code to make it appear in the post so many times, gee I wish there was a post preview here. I finally gave up and the slashes around the "red" were culled by the software onboard here.

    Thanks for figuring it out. Now to sleep and give it a go in the morning. I'm eternally grateful and ready to give this a fling.

  15. Lorelle
    Member
    Posted 9 years ago #

    ARGHHH!

    It won't work and it won't give me a specific error. Here is part of the code, if the forum's software won't rewrite some of it.

    INSERT INTO wp-posts ("post_excerpt","post_title","post_author","post_date","post_category","post_content","post_status")
    VALUES ("Internet Tips - Popup Spam Ads Spyware Gator Hotbar - how to get rid of, eliminate, and kill these nuisance programs on your computer.","Popups, Spammers, Spyware, Gator, GAIN, and Adware - Fighting Back","Lorelle VanFossen","11/25/2004 03:31:05 PM","Learn","<p>Determined to fight back against those who abuse the benefits of the Internet and the Web? As <a title=\"information and articles on nature photography, traveling, and writing\" href=\"../../about.html\">nature photographers and writers</a> traveling and living in a computer world, .........)

    And goes on and on. If it told me that it imported X files, or to X line, I'd have a point to refer to. But the error says SQL error, please refer to the online documentation...blah blah. Nothing specific.

    I escaped the quote marks thoroughly. This is only a three "post" test run. So it isn't too big. The text file is like 65K.

    Ideas?

  16. Lorelle
    Member
    Posted 9 years ago #

    I really need some help with this. I'm on a plane in less than 24 hours and will be gone for almost a month. With these uploaded, I can work on them from the road.

  17. Lorelle
    Member
    Posted 9 years ago #

    I really want to write up documentation for the codex on how to import html stuff to the database, through WP or not, so people like me who are moving from a normal site and not a blogging tool can get our stuff into WordPress and working.

    I'm stuck here, though. It's a mysql thing, but it is stuff being transferred in prepared for WordPress. Anyone help me?

  18. jokeofalltrades
    Member
    Posted 9 years ago #

    It looks like you gave up on using the MT import method, and I can't even get WP running to test anything to help with the rest, BUT... if you do feel like trying the MT import method again, here's an example of an MT 2.6 export file. Note that nothing is escaped - quotes in the HTML tags are just fine.

    AUTHOR: Michael
    TITLE: Ohwha Tador Kiam
    STATUS: Publish
    ALLOW COMMENTS: 2
    CONVERT BREAKS: 0
    ALLOW PINGS: 0
    PRIMARY CATEGORY:

    DATE: 10/05/2002 03:10:03 PM
    -----
    BODY:
    <p>
    <strong><acronym title="Law School Admissions Test">LSAT</acronym></strong>: done. Please don’t ask me how I did. I don’t get the score for weeks, and right now I’d like to focus on killing my brain cells with beer. I will admit, though, that I closed my writing sample with: <i>Ceterum censeo Carthaginem esse delendam</i>. I am an &uuml;berdork.
    </p>
    <p>
    My friend from work gets married tonight within stumbling distance of my apartment. Guess who’s going to enjoy himself at the reception tonight?
    </p>
    <p>

    </p>
    <p>
    Me, in case that wasn’t clear.
    </p>
    <p class="np">
    NP: The Autumns, <i>Rose Catcher</i>
    </p>
    -----
    EXTENDED BODY:

    -----
    EXCERPT:

    -----
    KEYWORDS:

    -----
    COMMENT:
    AUTHOR: ben
    EMAIL: ???????
    IP: ???????
    URL:
    DATE: 10/07/2002 06:58:26 PM
    So...How did you do?
    -----
    COMMENT:
    AUTHOR: Michael Hoke
    EMAIL: ???????
    IP: ???????
    URL: http://www.jokeofalltrades.com
    DATE: 10/08/2002 08:58:34 AM
    <strong>Beer</strong>: 10
    <strong>Brain</strong>: 0

    <strong>Winner</strong>: Beer!!!

    I assume that’s what you were asking about, because if you were asking about something else, say, oh, the <strong><acronym>LSAT</acronym></strong>, I’d have to beat you. Severely.
    -----

    The newline character ('\n') does NOT have to be typed - you just need to have a new line begin after the dashes (I don't know if saving the file in Windows will mess it up - if you can use TextPad or another editor that will allow you to save the export file with Unix-style newlines, that might be safer, but I don't know, as I can't get WP up to test). Also, I have no idea whether the import script in WP requires the elements to be in a certain order, but MT spit them out as above. Hope this helps.

    And yeah, this forum needs a post preview.

    --M

  19. Lorelle
    Member
    Posted 9 years ago #

    Oh, that's lovely. I still have the test file I did for the MT setup. Maybe it was the \n that was messing up the import.

    I'll give it a try first thing in the morning before my flight.

    Thanks!

  20. Lorelle
    Member
    Posted 9 years ago #

    Tried it and nothing is working. I can't get anything to import and I've tried very simple things. I can search and replace, and do all kinds of other things but I can't get the import working.

    There must be some little thing I'm slipping up on. I've written the whole thing out in notepad, checked all the quote marks, commas, semi colons, etc., and I can't get anything to import. Ideas?

    I haven't had a chance to put things back into MT format and give that try.

  21. Lorelle
    Member
    Posted 9 years ago #

    DO A LITTLE DANCE. THE TRUMPETS SOUND. SUCCESS IS MINE!!!

    Okay, so it worked. I started over from scratch with a few little files and it finally worked. I don't know exactly what it was that stopped almost the identical process for the past two months, but I finally got the import-mt.php to work on my html stuff. A lot of search and replace, but now I'm the queen of search and replace!

    Note
    One of the things that might have caused my problems with the import is that the:

    AUTHOR: Fred
    TITLE: Ohwha Tador Kiam
    STATUS: Publish
    ALLOW COMMENTS: 2
    CONVERT BREAKS: 0
    ALLOW PINGS: 0
    PRIMARY CATEGORY:
    DATE: 10/05/2002 03:10:03 PM

    MUST be in this order. It can't be Title > Author or Status > Title > Author. It has to be in this order. This might have been part of the screwups since somewhere I read that the order wasn't important. Well, folks, IT IS.

    Thanks to everyone for walking me through this 100 times.

  22. markusz
    Member
    Posted 9 years ago #

    I would like to convert my static html pages to wordpress, too. how did you convert the html to a MT export file? any recommendations?

    Thanks.

  23. Lorelle
    Member
    Posted 9 years ago #

    I'll be adding an article about the process to the codex soon, but in the interium, you need to copy either all of your html from your pages OR the specific information you will be putting into the MT format for importing into a sophisticated text editor or a word processor (if you really know what you are doing). Then begins a very long process of search and replace to remove the excess and add the formating to match the mt import layout.

    As mentioned above, the first part of the structure must match exactly in order as shown. If you don't have comments, pings, or any specific items, these can be ignored and dropped, but the first part of the structure must be exactly as shown.

    Unfortunately, this means that after you do all these wonderful search and replaces (and hopefully your original html layout is well defined and consistent...making this process very easy) you have to go through the entire thing manually and make sure that everything is lined up correctly.

    I really recommend doing no more than 50 "posts" at a whack, just in case you screw up. The first couple batches will be learning lessons, and the rest will go very fast once you get the process figured out. Take notes as you go.

    Really watch to make sure each field section is separated by 5 dashes and the end of the record is separated from the next by 8 dashes in a line.

    I'll work on the rest of the details later. Just got back from a month on the road traveling and I have to find my desk under my luggage.

    Do take lots of notes as you work and either post it here or email me directly to let me know what you learned as you went through the process so I can add it to my notes.

    Good luck. It's actually time consuming but easier than you might think.

  24. markusz
    Member
    Posted 9 years ago #

    Thanks for the tipps. I will beginn the formatting in the next few days and will let you know.
    As for pictures included in the blog entries, is there something I should be aware of or will the <img> tags just work fine?

    p.s. impressive amount of information on your pages!!!

  25. Lorelle
    Member
    Posted 9 years ago #

    On my web pages? Thanks. It's a lot of years of hard work. Once heralded as one of the largest personal websites on the net...until the bloggers took over. Damn them...;-)

    As for the pictures, I left the links all as they were (though another search and replace through the data would change the folders and such). But then I have a very structured way of sorting all my images. All "images" such as gifs which are non-photos go into a directory called "images" with specific subfolders depending upon their use. All photographs go into a "photos" directory with subfolders as per their use. I'm keeping the same hierarchy with the move.

    As described in this thread, if you put the base reference for the images in the header, then it will find your images from there. Very helpful.

    If you don't have your images well organized but dumped in here and there, fix it now or it will haunt you.

    And be sure and move away from the computer and take a walk or two or three every hour or so while doing this. It ain't a mindless task and requires a lot of sitting and careful looking as you plow through it.

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags

No tags yet.