XML file too big to import

July 12, 2008 by Wolfie · 10 Comments
Filed under: Blogging, Technology 

Part of “Going self-hosted with Wordpress : A Wolfie Guide”

On one of my older posts (Going Self-Hosted with Wordpress) Lisa has asked a question about how to import a larger than 2MB XML file into her new self-hosted WordPress installation, from her existing WordPress.com blog.

The first thing to say is that this is not a WordPress restriction, it is a restriction of the hosting company being used. If you have cPanel loaded on your host, take a look in ‘PHP Configuration’ and you’ll see that ‘upload_max_filesize’ is set to 2MB. (For some hosts this number may be smaller or larger; as always, your mileage may vary). There is a way that you can change this value, although I’ve only managed to make it 8MB on my server. (Before going ahead and making any of the changes that follow, please make sure that you have a working back-up of anything that you can’t afford to lose - just in case. I will not accept any responsibility for anything that goes wrong with your system and make no promises that any of these methods will work for you).

In your public_html directory, there should be a file called .htaccess. This is a small text file that, at least on my server, looks like this:

# BEGIN WordPress

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

If you add the line php_value upload_max_filesize 32M just before the final line, then when you go to the import screen you’ll be told that the maximum file size is 8MB. Whether the system will actually allow you to import an 8MB file is a different question and I have not been able to test this. Assuming it does work, once you’ve uploaded the XML file I suggest re-editing the file to take out the alteration.

If that does not work, then as far as I’m aware, you can’t change this setting yourself but your hosting company should be able to change it for you. If they can’t / won’t, then you need to look at a different solution which involves splitting the XML file into several smaller pieces.

Once you’ve got your exported XML file (to get this, from your wp.com dashboard go to ‘Manage’, then click ‘Export’. You get the option to restrict authors, but this won’t apply to most people. WordPress then saves an XML file to your hard-drive), you need to open it up in a text editor. I use TextWrangler - because its free and because it helpfully colour-codes tags, etc - but anything should work, even WordPad. What you’ll see is a huge list of text, with lots of things in tags (which are things like <channel>, <rss>, <item>, etc). All this text is what WordPress will use to reconstruct your blog on your new installation.

But that upload limit is a bit of a pain. I didn’t experience this issue when I moved The New Wolfs Howl because the export file was quite small (even now it’s only 1.4MB) but after a quick search around the forums it seems that this is quite a common problem. Unfortunately, splitting the XML file is not quite as simple as putting the second half of the file in a different document; there are certain things that have to be in each file.

The first thing to do is to work out how many files you need to split the file into. If your upload limit is 2MB and you have an 8MB file, then I would suggest you need to have five files - I know that eight divided by two is four, but I’ve added one to take care of the overlap. That will then give you a rough idea of who much of your file has to be moved each time. For example, my XML file is just over 21,500 lines - so I’d want just over 5,000 lines per file.

Take a look at your XML file and at the top you’ll see there are various items of header code (instructions from WordPress, etc). From the top line of the file (<?xml version=…) to <wp:base_blog_url>http://… needs to be in every file. Scroll right to the bottom of the file and </channel> and </rss> also need to be in every file. So, before any content has gone in, you want an XML file that looks like this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!– This is a WordPress eXtended RSS file generated by WordPress as an export of your blog. –>
<!– It contains information about your blog’s posts, comments, and categories. –>
<!– You may use this file to transfer that content from one site to another. –>
<!– This file is not intended to serve as a complete backup of your blog. –>
<!– To import this information into a WordPress blog follow these steps. –>

<!– 1. Log into that blog as an administrator. –>
<!– 2. Go to Manage: Import in the blog’s admin panels. –>
<!– 3. Choose “WordPress” from the list. –>
<!– 4. Upload this file using the form provided on that page. –>
<!– 5. You will first be asked to map the authors in this export file to users –>
<!– on the blog. For each author, you may choose to map to an –>
<!– existing user on the blog or to create a new user –>
<!– 6. WordPress will then import each of the posts, comments, and categories –>
<!– contained in this file into your blog –>

<!– generator=”WordPress/2.5.1″ created=”2008-07-12 05:47″–>

<rss version=”2.0″
xmlns:content=”http://purl.org/rss/1.0/modules/content/”
xmlns:wfw=”http://wellformedweb.org/CommentAPI/”
xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:wp=”http://wordpress.org/export/1.0/”
>

<channel>
<title>Your Blog Title</title>
<link>http://yourblogdomain.com</link>
<description>Your blog descriptions</description>
<pubDate>Fri, 11 Jul 2008 19:33:37 +0000</pubDate>
<generator>http://wordpress.org/?v=2.5.1</generator>
<language>en</language>
<wp:wxr_version>1.0</wp:wxr_version>
<wp:base_site_url>http://yourdomain.com</wp:base_site_url>
<wp:base_blog_url>http://yourblogdomain.com</wp:base_blog_url>

[this is where the content will go]

</channel>
</rss>

So, now you need to get your content in there. In the first of the XML files, you’ll want to make sure that you include your categories and your tags; these are listed immediately after the <wp:base_blog_url> line. They only need to be included in one file. Then the rest of the file is filled up with content; just look for <item> and </item> tags and cut and paste information between files. Always make sure you only copy complete items, though, otherwise you’ll have an error.

This way of splitting files is a laborious process and will take a fair while, but will work if you do it properly. There are file splitting utilities out there, but I have not tested any of them for effectiveness (or simplicity).

  • Wolfs Stuff

    • See my Amazon Wish List
    • Follow me on Twitter
    • Follow me on FriendFeed
  • Wolfs Helpers