Migrating Wordpress comments to HashOver for Hugo

Wed, Feb 23, 2022 11-minute read

I’ve briefly covered thoughts around adding a comment system in to this blog since it was migrated to Hugo.

I won’t rehash the decision making process - but fundamentally I wasn’t going to go with Disqus and wanted a simpler solution. There are various options out there and tl;dr I ended up choosing HashOver, mostly for the following reasons:

  1. It appeared to be reasonably simple to stand it up completely external to the ‘main’ site, i.e. in a sub-domain
  2. It runs on PHP/MySQL which is already available on my host. Other commenting systems needed different stacks which I don’t have access to without switching.

Incidentally, HashOver supports other persistence options such as SQLite and even XML which if you wanted to be completely serverless which could be interesting; but as it happens this comment migration process would likely have been a fair whack harder.

For a more in-depth look at commenting systems, there’s plenty about such as this ultimate guide to comments for static sites.

Installation

Installation is reasonably simple. I found two useful guides - here and here.

These were enough to get it installed. It was installed in a sub-domain and the only additional point was to then install the system in a folder in that sub-domain, rather than the root of it. There seems to be some reference issues that meant CSS / scripts etc. didn’t load, which was resolved when it went in a sub-folder of the sub-domain. Perhaps the biggest bugbear is that it purposely blocks itself if you’re running locally (localhost). I can think of reasons this is the case, but also think of reasons this isn’t really necessary. It makes getting it setup and tested before deployment much harder, in my view.

But anyway, with it installed and working, it was time to think about moving existing comments from Wordpress over to it. I initially tried this script but it just didn’t work. At all. So I decided to figure it out myself.

Migrating comments from Wordpress

This will not be a deadset step–by-step guide for you to follow as I can’t be certain that what worked for me will work identically for everybody else. This was a first-pass method that did eventually work and at the moment there’s no real incentive to improve it. However if enough people find this useful then I would look at automating it some more.

I also took a hack and hope approach by reverse engineering the process and debugging what happens as opposed to really trying to understand what was happening. I did at one point start looking through the HashOver code but it is a pretty sizeable codebase which I didn’t want to modify. Nevertheless, I wouldn’t be surprised to learn there is a simpler way of doing this. Looking at other things around the web, there are ways of accessing Wordpress comments… XML RPC sounds like one way, so if you had a more complex requirement, then that might be an option. I assume there are WP plugins available too.

Caveats

This assumes you are going the MySQL or SQLite route; if you are using XML as the storage mechanism then YMMV. It also assumes you have full access to the old Wordpress database. It also assumes that your old Wordpress database and your new HashOver database are on the same server. Where any of these assumptions don’t hold for you, you may need to be more experimental with data imports/exports etc. to shifty it all around as required.

This may also be dependent on the method of migrating Wordpress in the first place - in my case this was done using the Jekyll exporter for Wordpress and then the Jekyll import built in to Hugo. This is likely important as it depends on URL structure.

Volumes

I have about 150 posts with around 500 comments to migrate. Large enough to mean doing it manually was not an option but few enough that I didn’t need to worry too much about performance or resource limits. Worth considering in your own case.

Key to the migration

The key to getting this working is understanding how HashOver knows how to attach specific comments to specific posts. I created a couple of test comments and then inspected them in phpMyAdmin. HashOver only has two tables, comments and page-info and comments unsurprisingly, is the main one we’re interested in.

Hashover MySQL

It has 20 or so columns that are fairly self-explanatory:

Hashover MySQL

We are therefore going to try and import comments from Wordpress in to this table.

The domain will depend on how you have configured HashOver during install; as one installation would support multiple ’tenants’.

The key here is figuring out which columns must be populated in order for comments to appear, and the main one here is thread - this relates to the URL of the post. By comparing this with the URL structure of the blog, it transpires that the thread field in the database is the URL of the post with back slashes replaced by hyphens.

This can be controlled using the url variable in your Hugo posts’ front-matter.

Example

This post’s url, less the domain, is

/general/migrating-wordpress-comments-to-hashover-for-hugo

which is set in the front matter as so:

url: /general/migrating-wordpress-comments-to-hashover-for-hugo

therefore the corresponding thread value in HO will be:

general-migrating-wordpress-comments-to-hashover-for-hugo

Automated linking

Once we understand the linking of the two, we now need to find a way to link comments in our old Wordpress blog to the new URL structure. This might be the bit where I got lucky in

  1. how I migrated from Wordpress to Hugo in the first place because I believe the Jekyll export sets that url front matter, and
  2. the permalinking structure I had in place at the old blog.

Without that, you would possibly have to tag each post manually with the url variable, or find another way to join the two up based on your own URL structures or similar.

Why?

In order to automate the link, we need some common join between the old post and its comments to the new post and its (soon-to-be) comments. For me, the final portion of the URL in the new blog corresponds to the post_name field back in Wordpress land.

In other words, from the above example, if this post had been created in Wordpress, then the post_name field of the wp_posts table would have been general-migrating-wordpress-comments-to-hashover-for-hugo and we can therefore use that to create the required join.

Step 1: Finding all the old comments

This is reasonably simple with some SQL. I’ve specified here the columns I was interested in for populating in HashOver; modify these to suit.

To begin with, I wanted some basic info about the popular posts (with most comments) to get a feel for the challenge:

SELECT
    p.post_title,
    c.comment_post_ID,
    count(*)
from
    wp_comments c
    left join wp_posts p on c.comment_post_ID = p.id
where
    c.comment_approved = 1
group by
    p.post_title,
    c.comment_post_ID
order by
    count(*) desc

This will show you a list of all posts with the comment count. Variations on the above will help you filter / restrict which posts / comments you ultimately export, so modify this to suit:

SELECT
    p.post_title,
    p.post_name,
    c.comment_post_ID,
    c.comment_author, 
    c.comment_author_email,
    c.comment_author_IP,
    c.comment_date,
    c.comment_content
from
    wp_comments c
left join wp_posts p on c.comment_post_ID = p.id
where
    c.comment_approved = 1

Notice there the all important post_name getting yanked out as well.

Step 2: Getting a list of all your new posts, their URLs and their names

This is the janky bit. What we need now is to generate the thread value for the comments table, based on the url of the new post.

In other words, you need to get a list of all the urls for all your new posts from your new posts. This ensures the posts actually exist in the new world and you don’t need to know about the old permalinking structure. It does, of course, assume that the post_name has not changed during the migration.

Jank #1

When I was doing this initially, I was thinking it would be super handy if the hugo runtime had an option for this and knowing no different I hacked together a manual method. And of course, now that I actually look at it more carefully, it does.

My original method was to interrogate the sitemap.xml file and use XQuery to extract all the <loc> fields, then modifying the output down to the list of posts.

Better though, is you can use hugo list all which gives you a comma separated list of all your posts and it’s the last column that we’re interested in.

Jank #2

The URL that you’ve just extracted at the moment will be the full URL to the post, including domain and folder structure. We need to arrive at two variants on this; one the post_name equivalent only and then the thread friendly variant of the full URL as described earlier.

To do this use whatever tools or scripts you’re most comfortable with to do some string manipulation; whether that’s code, script, manually or even… Excel?

=RIGHT(A2,LEN(A2)-FIND("@",SUBSTITUTE(A2,"/","@",LEN(A2)-LEN(SUBSTITUTE(A2,"/",""))),1))

(You’re welcome.)

Example

With URL values of, e.g.

https://matt-thornton.net/general/migrating-wordpress-comments-to-hashover-for-hugo/
https://matt-thornton.net/tech/sharepoint/some-other-topic-about-something/`

we need to end up with a table of

post_name thread
migrating-wordpress-comments-to-hashover-for-hugo general-migrating-wordpress-comments-to-hashover-for-hugo
some-other-topic-about-something tech-sharepoint-some-other-topic-about-something

To simplify things, I created a MySQL table of these values in the old Wordpress database called tmp_hugo_posts.

You obviously don’t need to create actual tables / views for all this stuff; do whatever you’re comfortable with…

You can then incorporate the new table to your original query…

SELECT
    p.post_title,
    p.post_name,
    c.comment_post_ID,
    c.comment_author, 
    c.comment_author_email,
    c.comment_author_IP,
    c.comment_date,
    c.comment_content,
    h.thread
from
    wp_comments c
left join wp_posts p on c.comment_post_ID = p.id
inner join tmp_hugo_posts h on p.post_name = h.post_name
where
    c.comment_approved = 1

You’ll note it’s an inner join on the new table. I’ve done this because I did not migrate all the posts from my original Wordpress into Hugo and therefore there may be comments for posts that do not actually exist - this will ensure we only get comments for posts that do exist in Hugo.

With this working, I created a view from the SQL, called view_tmp_comments_for_hugo in the Wordpress database.

Bringing it all together

You should now have everything you need to import the Wordpress comments and thread value in the HashOver comments table.

SET @count = 0;
SET @currthread = null;    
SELECT 'domain',
        @count := CASE
            WHEN @currthread = url 
			THEN @count := @count + 1
        ELSE 1 END num,
        @currthread := url url,
        comment_content,
        comment_date,        
        comment_author,
        comment_author_email,
        comment_author_ip
   FROM wordpress.`view_tmp_comments_for_hugo` WHERE 1 LIMIT 100
   ORDER BY url, comment_date

This will show you what’s going to happen with the first 100 records.

There are a few things to say here. The first is that it assumes you are able to do cross-database calls. PMA allows this but YMMV. It assumes your hashover database is called hashover and your Wordpress database is called wordpress so you may need to tweak these. You can also see where it references the view from before, you could do this as a sub-query if you prefer.

Multiple comments on a post

Secondly, the query variables are a necessity because of the way HashOver deals with multiple comments on posts. It expects the comment field to be an incrementing number for each comment in the thread, which then resets with the next thread.

The issue here is if e.g., you just set it 1 for all of them, then you only ever see 1 comment per post. If you do not properly increment the count by thread (or have gaps in the numbering) then HashOver interprets this as deleted comments.

We therefore need to use a window function to number each row in the group (thread) and then reset it each time we process a new thread.

If you have access to MySQL 8, then you could use the actual ROW_NUMBER() window function with the OVER clause. Check the MySQL docs for more info, but it would (probably) look a bit like this:

SELECT 'domain',
        ROW_NUMBER() OVER (PARTITION BY url ORDER BY url)
        url,
        comment_content,
        comment_date,        
        comment_author,
        comment_author_email,
        comment_author_ip
   FROM wordpress.`view_tmp_comments_for_hugo` WHERE 1 LIMIT 100
   ORDER BY url, comment_date

Unfortunately my host is running an older version of MySQL so I had to come up with this pretty hacky workaround, that nevertheless achieves the same thing.

Go!

Once you’re happy this is doing what you need, it’s time to fire it in:

SET @count = 0;
SET @currthread = null;
    INSERT INTO
    hashover.`comments` (
        `domain`,
        `comment`,
        `thread`,
        `body`,        
        `date`,
        `name`,        
        `email`,        
        `ipaddr`
    )
SELECT 'domain',
        @count := CASE
            WHEN @currthread = url 
			THEN @count := @count + 1
        ELSE 1 END num,
        @currthread := url url,
        comment_content,
        comment_date,        
        comment_author,
        comment_author_email,
        comment_author_ip
   FROM wordpress.`view_tmp_comments_for_hugo` WHERE 1 
   ORDER BY url, comment_date

And with that, your posts should now have lots of lovely comments migrated over 👍 .

Hope this was useful; please do (unironically) leave a comment below with any thoughts or issues.

Posts in this Series