1. Original Entry + Comments2. Write a Comment3. Preview Comment


October 16, 2014  |  Playing with the Tumblr API: Part 3  |  4974 hit(s)

Carrying on with adventures using the Tumblr API. (Part 1, Part 2)

As noted, I decided that I wanted to create a local HTML file out of my downloaded/exported Tumblr posts. In my initial cut, I iterated over the list of TumblrClass instances that I'd assembled from the downloaded posts, and I then wrote out a bunch of hard-coded HTML. This worked, but was inflexible, to say the least—what if I wanted to reorder items or something?

So I fell back on yet another old habit. I created a "template" of the HTML block that I wanted, using known strings in the template that I could swap out for content. Here's the HTML template layout, where strings like %%%posttitle%%% and %%%posturl%%% are placeholders for where I want the HTML to go:
<!-- tumblr_block_template.html -->
<div class="post">
    <div class="posttitle">%%%posttitle%%%</div>
    <div class="postdate">%%%postdate%%%</div>
    <div class="posttext">%%%posttext%%%</div>
    <div class="postsource">%%%postsource%%%</div>
    <div class="posturl"><a href="%%%posturl%%%"
        target="_blank">%%%posturl%%%</a></div>
    <div class="postctr">[%%%postcounter%%%]&nbsp;
        <span class="posttype">%%%posttype%%%</span>
    </div>
</div>
The idea is to read the template, read each TumblrClass item, swap out the appropriate member for the placeholder, and build up a series of these blocks. Here's the code to read the template and build the blocks of content:
html_output = ''
 
html_file = open('c:\\Tumblr\\tumblr_block_template.html', 'r')
html_block_template = html_file.read()
html_file.close()
 
ctr = 0
for p in sorted_posts:
    new_html_block = html_block_template
    ctr += 1
    new_html_block = new_html_block.replace('%%%posttitle%%%', p.post_title)
    new_html_block = new_html_block.replace('%%%postdate%%%', p.post_date)
    new_html_block = new_html_block.replace('%%%posttext%%%', p.post_text)
    new_html_block = new_html_block.replace('%%%postsource%%%', p.post_source)
    new_html_block = new_html_block.replace('%%%posturl%%%', p.post_url)
    new_html_block = new_html_block.replace('%%%postcounter%%%', str(ctr))
    html_output += new_html_block
To embed these <div> blocks into an HTML file, I did the same thing again—I created a template .html file that looks like this:
<!-- tumblr_template.html -->
<html>
<head>
  <link rel="stylesheet" href="tumbl_posts.css" type="text/css">
  <meta http-equiv="content-type" content="text/html;charset=utf-8">
</head>
<body>
<h1>Tumblr Posts</h1>
%%%posts%%%
</body>
</html>
With this in hand, I can read the template .html file and do the swap thing again, and then write out a new file. To actually write the file, I generated a timestamp to use as part of the file name: 'tumbl_bu-' plus %Y-%m-%d-%H-%M-%S plus '.html'.

There was one complication. I got some errors while writing the file out, which turned out to be an issue with Unicode encoding—apparently certain cites that I pasted into Tumblr contain characters that can’t be converted to ASCII, which is the default encoding for writing out a file. The solution there is to use the codecs module to convert. (It’s possible that this is a problem only in Python 2.x.)

Here’s the complete listing for the Python script. (I wrapped some of the lines in a Python-legal way to squeeze them for the blog.)
import datetime,json,requests
import codecs # For converting Unicode in source

class TumblrPost:
def __init__(self,
post_url,
post_date,
post_text,
post_source,
post_title,
post_type):
self.post_url = post_url
self.post_date = post_date
self.post_text = post_text
self.post_source = post_source
self.post_type = post_type
if post_title is None or post_title == '':
self.post_title = ''
else:
self.post_title = post_title

all_posts = [] # List to hold instances of the TumblrPost class
html_output = '' # String to hold the formatted HTML for all the posts
folder_name = 'C:\\Tumblr\\'

# Get the text posts and add them as TumblrPost objects to the all_posts_list
print "Fetching text entries ..."
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=[MY_KEY]'
offset = 0
posts_still_left = True
while posts_still_left:
request_url += "&offset=" + str(offset)
print "\tFetching text entries (%i) ..." % offset
tumblr_response = requests.get(request_url).json()
total_posts = tumblr_response['response']['total_posts']
for post in tumblr_response['response']['posts']:
# See https://www.tumblr.com/docs/en/api/v2#text-posts
p = TumblrPost(post['post_url'],
post['date'],
post['body'], '',
post['title'],
'text') # No source for text posts
all_posts.append(p)
offset += 20
if offset > total_posts:
posts_still_left = False

# Get the quotes posts and add them as TumblrPost objects to the all_posts_list.
print "Fetching quote entries ..."
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/quote?api_key=[MY_KEY]'
offset = 0
posts_still_left = True
while posts_still_left:
request_url += "&offset=" + str(offset)
print "\tFetching quote entries (%i) ..." % offset
tumblr_response = requests.get(request_url).json()
total_posts = tumblr_response['response']['total_posts']
for post in tumblr_response['response']['posts']:
# See https://www.tumblr.com/docs/en/api/v2#quote-posts
p = TumblrPost(post['post_url'],
post['date'],
post['text'],
post['source'], '',
'quote') # No title for quote posts
all_posts.append(p)
offset += 20
if offset > total_posts:
posts_still_left = False

sorted_posts = sorted(all_posts,
key=lambda tpost: tpost.post_date,
reverse=True)

print "Creating HTML file ..."

# Read a file that contains the HTML layout of the posts,
# with placeholders for individual bits of data
html_file = open(folder_name + 'tumblr_block_template.html', 'r')
html_block_template = html_file.read()
html_file.close()

ctr = 0
for p in sorted_posts:
new_html_block = html_block_template
ctr += 1
new_html_block = new_html_block.replace('%%%posttitle%%%', p.post_title)
new_html_block = new_html_block.replace('%%%postdate%%%', p.post_date)
new_html_block = new_html_block.replace('%%%posttext%%%', p.post_text)
new_html_block = new_html_block.replace('%%%postsource%%%', p.post_source)
new_html_block = new_html_block.replace('%%%posturl%%%', p.post_url)
new_html_block = new_html_block.replace('%%%postcounter%%%', str(ctr))
new_html_block = new_html_block.replace('%%%posttype%%%', p.post_type)
html_output += new_html_block

# The template has a placeholder for the content that's generated dynamically
html_file = open(folder_name + 'tumblr_template.html', 'r')
html_file_contents = html_file.read()
html_file.close()
html_file_contents = html_file_contents.replace('%%%posts%%%', html_output)

# Open (i.e., create) a new file with the ability to write Unicode.
# See http://stackoverflow.com/questions/934160/write-to-utf-8-file-in-python
file_timestamp = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
with codecs.open(folder_name +
'tumbl_bu-' +
file_timestamp +
'.html', 'w', "utf-8-sig") \
as new_html_file:
new_html_file.write(html_file_contents)
new_html_file.close()

print 'Done!'