Wednesday, 2 November 2011

Converting Google Reader Starred Items to Firefox Bookmarks

[Starred Items has not been removed from Google Reader.

I think that I wasn't seeing stars after the changeover because I was using Firefox 3.5 as my browser. Google no longer support versions lower than 3.6. It's either that or I have UI fatigue. That's a real thing, isn't it?]

Google made many changes to Google Reader yesterday. It was unexpected and unwelcome to the many of the people who use Google Reader as their main portal into the Internet.

The big change was Google replacing Reader's Sharing feature with the Google+ one, that seems like an improvement. But they simplified Google Reader while they were at it, and one of the simplifications was to remove the Starred Items feature.

I had been using Starred Items as a bookmarks folder for feeds. I had 413 Starred Items.

Google have given us the option of downloading our Starred Items data. Look in the Import/Export tab of Reader's Manage Subscriptions page. (You will have to be logged into Google reader to follow that link.) The Reader JSON file contains full data of the posts that were in your Starred Items. But since there is no program that can convert the raw data in that file into a usable form, it is not the mercy it seems.

I wrote myself a Python script to convert my Reader JSON file into a file that could I import into my Firefox browser's bookmarks list.  So I now have all my bookmarks back: a Starred Items bookmark menu folder, with a subfolder for each blog, and the bookmarks themselves lead to the original posts in those blogs.

I've appended my script in the box below. If you know how to run a Python script, you will find it useful. If you know how to program in Python, I encourage you to convert it into a form that a non-programmer can use. If you desperately need your Starred Items, and they are not personal, I can convert them for you - leave a comment.

# firefoxize-starred-items.py
#
# Reads a Google "Reader JSON" file exported from Google Reader and
# outputs an HTML file suitable for importing into Firefox's 
# bookmarks menu. This rescues you if you have been using Google 
# Reader Starred Items as a bookmark file for feeds.
#
# See http://googlereader.blogspot.com/2011/10/new-in-reader-fresh-design-and-google.html
# and, when logged in, http://www.google.com/reader/settings?display=import

import json, time, codecs

InputFile = '/home/glyn/Desktop/starred-items.json'  #download this
OutputFile = '/home/glyn/Desktop/starred-items-bookmarks.html'  #import this

with codecs.open(InputFile, 'r', encoding='utf-8') as f:
    GooglesItems = json.load(f)['items']

FeedURLs = {}
FeedItems = {}

for item in GooglesItems:
    feedTitle = item['origin']['title']
    feedUrl = item['origin']['htmlUrl']
    itemDate =  item['published']
    if item.has_key('title'):
        itemTitle = item['title'].split('\n')[0]
    else:
        itemTitle = feedTitle + ', ' +  time.strftime('%x', time.localtime(itemDate))
    if item.has_key('alternate'):
        itemURL = item['alternate'][0]['href']
    elif item.has_key('enclosure'):
        itemURL = item['enclosure'][0]['href']
    else:
        itemURL = feedURL
    FeedURLs[feedTitle] = feedUrl
    feedItems = FeedItems.setdefault(feedTitle, [])
    feedItems.append((itemTitle, itemURL, itemDate))

with codecs.open(OutputFile, 'w', encoding='utf-8') as b:
    b.write('''<!DOCTYPE NETSCAPE-Bookmark-file-1>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
    <TITLE>Bookmarks</TITLE>
    <H1>Bookmarks Menu</H1>
    <DL><p>\n''')
    b.write('<DT><H2>Google Reader Starred Items</H2>\n\n')
    b.write('<DL><p>\n')
    for feedTitle, feedURL in FeedURLs.items():
        b.write('<DT><H3>%s</H3>\n' % feedTitle)
        b.write('<DL>\n')
        b.write('<DT><A HREF="%s">(%s)</A>\n' % (feedURL, feedTitle))
        for (title, url, date) in FeedItems[feedTitle]:
            b.write('<DT><A HREF="%s" LAST_MODIFIED="%i">%s</A>\n' % (url, date, title))
        b.write('</DL>\n\n')
    b.write('</DL>\n\n\n')
    b.write('</DL><p>\n')

8 comments:

Richard Carter said...

I'm confused: my new Google Reader still has starred items!

Glyn said...

Oh God, you're right. Those are big stars right there. I don't understand how I couldn't see them.

I think it might have because I was using Firefox 3.5, which Google considers obsolete, up until yesterday afternoon.

daniel contarelli said...

Hi Glyn, your script is still very useful for those of us who want to use the starred item in another way, as importing them into an old palm tx for reading offline, as me :)
But I had a little problem with the script, after all the articles were parsed, I had this error:


Traceback (most recent call last):
File "firefoxize-starred-items.py", line 23, in
feedTitle = item['origin']['title']
KeyError: 'title'


Any clue?
Thanks!

Glyn said...

The items don't have a consistent format, I think that they are straight JSON representations of RSS and Atom post, whether those files had complete sets of fields or not. So I had to experiment a bit to get my starred items list read, and it looks like you'll have to experiment a little further.

If you use this as your parsing loop, it will print out the item that causes your problem, and you can decide the best thing to do with it:

import pprint
pp = pprint.PrettyPrinter()

for item in GooglesItems:
    try:
        ...
    except KeyError:
        pp.pprint(item)
        break

(BTW, if you want to read posts offline most of them have a 'content' field containing the HTML body of the post.)

norz said...

I found your script very interesting, thank you! However, what I'm looking for would be a way to import google reader items with the content of the post. Maybe by importing them as email messages in thunderbird?

Melanie said...

Thanks!!! :-) I finally succeed to export as html. Unfortunately, I can't have an automatic dump; I should manually download the JSON file.
And as norz, I'm also looking to get content. I suppose it's not so difficult to add a another variable. Maybe I will do some tries, but I'm not sure I will be able to get it.
Hope Google will provide something more intuitive...
But whatever, thanks!

Glyn said...

Well, I'm glad someone's finding my overly-hasty code useful.

I don't know how to import unread posts from Google Reader accounts. Googling "Python Google Reader API" will bring up quite a bit on the subject, but it all looks pretty involved.

Maybe it would be better to write a script that reads posts from the feeds you want directly, rather than going through Google Reader? This library looks a good place to start:

http://packages.python.org/feedparser/

Glyn said...

Wait! This person has something that works beautifully:

http://blog.yjl.im/2010/08/using-python-to-get-google-reader.html