About

I'm Mike Pope. I live in the Seattle area. I've been a technical writer and editor for over 30 years. I'm interested in software, language, music, movies, books, motorcycles, travel, and ... well, lots of stuff.

Read more ...

Blog Search


(Supports AND)

Google Ads

Feed

Subscribe to the RSS feed for this blog.

See this post for info on full versus truncated feeds.

Quote

A tyrant must put on the appearance of uncommon devotion to religion. Subjects are less apprehensive of illegal treatment from a ruler whom they consider god-fearing and pious. On the other hand, they do less easily move against him, believing that he has the gods on his side.

— Aristotle



Navigation





<July 2017>
SMTWTFS
2526272829301
2345678
9101112131415
16171819202122
23242526272829
303112345

Categories

  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  

Contact

Email me

Blog Statistics

Dates
First entry - 6/27/2003
Most recent entry - 7/21/2017

Totals
Posts - 2441
Comments - 2554
Hits - 1,968,118

Averages
Entries/day - 0.47
Comments/entry - 1.05
Hits/day - 383

Updated every 30 minutes. Last: 3:36 AM Pacific


  12:24 PM

I rassled a bit recently with a couple of dumb issues when creating some Word macros, so I thought I'd better write these up for my own future reference. To be clear, "dumb" here means that I should already have known this stuff, and I wasted time learning it.

1. Calling subroutines

I was trying to call a sub like this:
Sub SomeMacro
SomeOtherSub(p1, p2)
End Sub
Word got so mad about that SomeOtherSub call:


Turns out that when you call a subroutine in VBA and pass parameters, you do that without parentheses:
SomeOtherSub p1, p2
The parameters can be positional, as here, or named. For the latter, use the := syntax:
SomeOtherSub p1:="a value", p2:="another value" 

2. Exposing subroutines (implicit access modifiers)

Here was another kind of bonehead mistake I made. I wrote a subroutine sort of like this:
Sub MyMacro(param1 As String, param2 As String)
' Code here
End Sub
Then I tried to actually run this macro (Developer > Macros). The macro stubbornly refused to appear in the Macros dialog box. If I was in the macro editor and pressed F5 to try to launch it in the debugger, Word just displayed the Macros dialog box for me to pick which macro to run, but again, did not display the actual macro that I actually wanted to run.

Anyway, long story short (too late, haha), the problem was that the Sub definition included parameters:
Sub MyMacro(param1 As String, param2 As String)
Apparently if a subroutine has parameters like that, VBA considers it to have protected access—it can be called from another macro, but it can't be launched as a main. This makes sense, but it wasn't immediately obvious. What I really wanted was this:
Sub MyMacro()
I had included the parameters by accident (copy/paste error), so it was basically a dumb mistake. I just removed them and then things worked. Well, they worked until VBA ran into the next dumb mistake, whatever that was. (In my code there's always another one.)

[categories]   ,

|


  02:35 PM

Another quick post about Word, primarily for my own benefit (when I forget this later).

Word has several options for how you can paste text:


They are (in order):
  • Keep Source Formatting. This option keeps the original formatting (both character and paragraph formatting), but converts it to direct formatting.

  • Merge Formatting. This option copies basic character formatting (bold, italics, underline) as direct formatting, but does not copy any paragraph formatting.

  • Use Destination Styles. This option copies the text and applies styles that are in the target document. (This option appears only if there matching styles in the target doc.)

  • Keep Text Only. This option copies the text as plain text, with no formatting.
I need the last one (paste plain text) more often than any of the others, so I want it on a keyboard shortcut. You can do this by recording a macro of yourself using the Keep Text Only option. But I realized there's an even easier way—just assign a keyboard shortcut to the built-in PasteTextOnly command.

I keep forgetting that most anything Word can do has a command. If a gesture requires just one command, you can assign a keyboard shortcut directly to it. Maybe writing this out will help me remember.

Update I added a video!


[categories]   , ,

|


  12:01 AM

This is another in a series of blog posts about how I configure Microsoft Word, which I add here primarily for my own reference.

I often use the Style pane, and within that pane, I often want to change the styles that are displayed. Sometimes I want to see all the styles; sometimes just the styles that are defined in the current document; sometimes just the styles currently in use.

You can change this display by using a dialog box. In the Styles pane, click the Options link, and then use the dropdown lists to select which styles to display and how they're ordered, like this:


But that can get to be an annoying number of clicks if you're switching between these display options frequently. So, macros to the rescue. I recorded myself making one of these changes, then created a couple of variations to give me the different displays I want. Here are the macros I currently use, where the sub name is (I hope) self-explanatory:
Sub SetStylesPaneToAllAlphabetical()
ActiveDocument.FormattingShowFilter = wdShowFilterStylesAll
ActiveDocument.StyleSortMethod = wdStyleSortByName
End Sub

Sub SetStylesPaneToInCurrentDocument()
ActiveDocument.FormattingShowFilter = wdShowFilterStylesAvailable
ActiveDocument.StyleSortMethod = wdStyleSortByName
End Sub

Sub SetStylesPaneToInUse()
ActiveDocument.FormattingShowFilter = wdShowFilterStylesInUse
ActiveDocument.StyleSortMethod = wdStyleSortByName
End Sub
To complete the picture, I map the macros to these keyboard shortcuts:

ctrl+shift+p,aSetStylesPaneToAllAlphabetical
ctrl+shift+p,cSetStylesPaneToInCurrentDocument
ctrl+shift+p,uSetStylesPaneToInUse

[categories]   , ,

|


  12:23 AM

I have used Microsoft Word for years—decades—but hardly a week goes by when I don't learn something new. (Including things that are probably pretty well known to others, oh well.) Anyway, TIL about how to use the batch version of auto-formatting in Word. Since I think a lot of people already know this, I'm adding the information here primarily for later reference for myself.

Word has settings to perform "auto-formatting as you type." These include things like converting quotation marks into so-called smart quotes (i.e., typographical quotation marks), converting double hyphens (--) into em-dashes (—), converting typed fractions (1/2) into typographic fractions (½), etc. You set these options in the AutoCorrect dialog box: File > Options > Proofing, AutoCorrect Options button, AutoFormat As You Type tab.

It turns out that Word can also apply these auto-formatting instructions after the fact. In the same AutoCorrect dialog box, there's a tab named just AutoFormat:


This has most of the same options as with auto-format-as-you-type. Here's the neat part: you can get Word to apply these formatting options by pressing alt+ctrl+k. There's no UI gesture, but you can use the feature for customizing the ribbon to add the relevant command to the ribbon or Quick Access Toolbar.

A use case where I can see this working pretty well is if you paste text in from a text editor. (I do this all the time.)

Credit where it's due: I learned about this from the article How to Automatically Format an Existing Document in Word 2013 by Lori Kaufman on the How-To Geek site. As I say, I'm adding this info here primarily for my own benefit. :-)

[categories]   ,

|


  09:04 AM

I was reading a thread on a computer forum, and someone asked this question:
Quote:
Your password should contain at least 6 characters

If you're going to require it; don't say "should", say "must".
This set off an interesting discussion on the semantics of should in this context. I've written about this before, so I was interested to hear how people interpreted the example.

Here is a sampling of the more serious posts on the thread:

From the requirements document: "The password entered by the user should be rejected if it does not contain at least six characters." If I received that requirement from my boss, I would make darn sure that the password is rejected. I don't think I would randomly reject some and not others.


The software is being polite; it's anticipating users who do not like being told what to do.


If it says "should" then it is not optional, like in "could". You should be "this tall" to ride this ride.

A number of people pulled out dictionary definitions (Wikitionary, heh). And one person cited RFC 2119 ("Key words for use in RFCs to Indicate Requirement Levels"), which states:

MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.

SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

All of which goes to the original poster's point—the message was ambiguous and should (ha) have been written with must. For those of us who don't keep a mental catalog of RFC recommendations, the more accessible Microsoft style guide says:

Use should only to describe a user action that is recommended, but optional. Use must only to describe a user action that is required.

In documentation, in error messages, in any context where the message needs to be clear and you aren't there to help the reader understand, avoid should when you mean must.

[categories]   , , ,

|


  09:30 PM

I found an interesting intersection recently of two things I think about a lot. One is traffic, a topic of perennial interest to me. The second is data visualizations, something that I'm comparatively new to but very interested in.

Let me back up slightly. Not long ago (maybe in 2013?), the state of Washington introduced variable speed limits in a some areas that are prone to congestion, like on I-5 northbound approaching Seattle:


[source, license, cropped slightly and reduced]

I was traveling with someone (my daughter, I think) who asked "Does that work?" To which my answer was that it could, if people actually obeyed these variable limits[1]. (Which they don't at all.) What's the theory?

On their website, the state explains variable speed limits this way:
Ideally, approaching traffic will slow down and pass through the problem area at a slower but more consistent speed reducing stop and go traffic. By reducing stop and go traffic we’re also reducing the probability of an accident by giving drivers more time to react to changing road conditions. This helps drivers avoid the need to brake sharply as they approach congestion.
Hmmm. This sort of describes the theory, but only in general terms. I also found the following on a different state site, which explains the theory even less, but does include a curious bonus reason (emphasis mine) for variable speed limits:
Variable speed limits offer considerable promise in restoring the credibility of speed limits and improving safety by restricting speeds during adverse conditions.
So let me give it a shot. Imagine that you want to go to the movies. You go to the ticket booth and buy tickets. Let's say that this transaction takes 30 seconds. Just as you finish, someone else walks up to buy their ticket. Just as they finish their 30-second transaction, a third person walks up, and so on. As long as people don't arrive at the ticket booth any more frequently than every 30 seconds, there's never a line.

But let's say that 15 seconds after you started buying your tickets, someone gets in line behind you. That person has to wait 15 seconds. And let's say people arrive at the ticket booth every 15 seconds from then on, but the ticket vender can't go any faster than one transcation per 30 seconds. The result is that the line grows, and it continues to grow as long as people arrive at the queue faster than they can buy tickets. The ticket booth is a bottleneck, and the queue is congestion.

Make sense? Congestion results from people being added to a queue (or otherwise approaching a bottleneck) faster than they can leave it. This is as true for people buying movie tickets as it is for cars approaching a slowdown. If you can prevent people from joining the queue faster than they can leave it, you can reduce the delay. If you're selling movie tickets, I don't know how you prevent people from getting in line. But if you're managing traffic, you can try to keep people out of the congestion by slowing down how fast they get to the point where the slowdown occurs.

Some people have understood this for a long time, and voluntarily slow down when it looks like traffic is heavy ahead. William Beaty has a great article (undated?) in which he dives deep on ways that even a few drivers who behave intelligently in congestion and during merges can improve flow for everyone. And while his suggestions undoubtedly work, they rely on people engaging in non-intuitive behavior, like allowing people to merge (gasp!) and leaving long-ish gaps ahead of them.

Since most people don't have the benefit of Beaty's insights, the state has decided to try variable speed limits: if people won't regulate their own speed in reaction to congestion ahead, the state (the state's computers) will attempt to do it for them.

This brings me to the visualization part of our story. Lewis Lehe is a graduate student in transportation engineering who's created a beautiful, interactive visualization that illustrates bottlenecks. (The viz is actually about the difference between bottlenecks, which I'm interested here, and gridlock.) Lehe's visualization shows cars arriving at and leaving a bottleneck, and you can adjust the arrival rate to see interactively how congestion grows if cars arrive faster than they can leave (or vice versa). Click the link and then play with the viz to get a great sense of how variable speed limits could work.


An interesting promise of self-driving cars, like the one apparently forthcoming from Google, is that they could be a whole lot smarter than human drivers about driving in congested conditions. Assuming, of course, that humans aren't allowed to take control of a car that's driving—per their own sense—exasperatingly slow. That remains to be seen.

[1] In fact, the minimum speed (perhaps by federal law?) is 40 MPH, so there's definitely a bottom limit to when variable speeds could be effective.

[categories]  

|


  09:52 AM

Carrying on with adventures using the Tumblr API. (Part 1, Part 2)

As noted, I decided that I wanted to create a local HTML file out of my downloaded/exported Tumblr posts. In my initial cut, I iterated over the list of TumblrClass instances that I'd assembled from the downloaded posts, and I then wrote out a bunch of hard-coded HTML. This worked, but was inflexible, to say the least—what if I wanted to reorder items or something?

So I fell back on yet another old habit. I created a "template" of the HTML block that I wanted, using known strings in the template that I could swap out for content. Here's the HTML template layout, where strings like %%%posttitle%%% and %%%posturl%%% are placeholders for where I want the HTML to go:
<!-- tumblr_block_template.html -->
<div class="post">
    <div class="posttitle">%%%posttitle%%%</div>
    <div class="postdate">%%%postdate%%%</div>
    <div class="posttext">%%%posttext%%%</div>
    <div class="postsource">%%%postsource%%%</div>
    <div class="posturl"><a href="%%%posturl%%%"
        target="_blank">%%%posturl%%%</a></div>
    <div class="postctr">[%%%postcounter%%%]&nbsp;
        <span class="posttype">%%%posttype%%%</span>
    </div>
</div>
The idea is to read the template, read each TumblrClass item, swap out the appropriate member for the placeholder, and build up a series of these blocks. Here's the code to read the template and build the blocks of content:
html_output = ''
 
html_file = open('c:\\Tumblr\\tumblr_block_template.html', 'r')
html_block_template = html_file.read()
html_file.close()
 
ctr = 0
for p in sorted_posts:
    new_html_block = html_block_template
    ctr += 1
    new_html_block = new_html_block.replace('%%%posttitle%%%', p.post_title)
    new_html_block = new_html_block.replace('%%%postdate%%%', p.post_date)
    new_html_block = new_html_block.replace('%%%posttext%%%', p.post_text)
    new_html_block = new_html_block.replace('%%%postsource%%%', p.post_source)
    new_html_block = new_html_block.replace('%%%posturl%%%', p.post_url)
    new_html_block = new_html_block.replace('%%%postcounter%%%', str(ctr))
    html_output += new_html_block
To embed these <div> blocks into an HTML file, I did the same thing again—I created a template .html file that looks like this:
<!-- tumblr_template.html -->
<html>
<head>
  <link rel="stylesheet" href="tumbl_posts.css" type="text/css">
  <meta http-equiv="content-type" content="text/html;charset=utf-8">
</head>
<body>
<h1>Tumblr Posts</h1>
%%%posts%%%
</body>
</html>
With this in hand, I can read the template .html file and do the swap thing again, and then write out a new file. To actually write the file, I generated a timestamp to use as part of the file name: 'tumbl_bu-' plus %Y-%m-%d-%H-%M-%S plus '.html'.

There was one complication. I got some errors while writing the file out, which turned out to be an issue with Unicode encoding—apparently certain cites that I pasted into Tumblr contain characters that can’t be converted to ASCII, which is the default encoding for writing out a file. The solution there is to use the codecs module to convert. (It’s possible that this is a problem only in Python 2.x.)

Here’s the complete listing for the Python script. (I wrapped some of the lines in a Python-legal way to squeeze them for the blog.)
import datetime,json,requests
import codecs # For converting Unicode in source

class TumblrPost:
def __init__(self,
post_url,
post_date,
post_text,
post_source,
post_title,
post_type):
self.post_url = post_url
self.post_date = post_date
self.post_text = post_text
self.post_source = post_source
self.post_type = post_type
if post_title is None or post_title == '':
self.post_title = ''
else:
self.post_title = post_title

all_posts = [] # List to hold instances of the TumblrPost class
html_output = '' # String to hold the formatted HTML for all the posts
folder_name = 'C:\\Tumblr\\'

# Get the text posts and add them as TumblrPost objects to the all_posts_list
print "Fetching text entries ..."
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=[MY_KEY]'
offset = 0
posts_still_left = True
while posts_still_left:
request_url += "&offset=" + str(offset)
print "\tFetching text entries (%i) ..." % offset
tumblr_response = requests.get(request_url).json()
total_posts = tumblr_response['response']['total_posts']
for post in tumblr_response['response']['posts']:
# See https://www.tumblr.com/docs/en/api/v2#text-posts
p = TumblrPost(post['post_url'],
post['date'],
post['body'], '',
post['title'],
'text') # No source for text posts
all_posts.append(p)
offset += 20
if offset > total_posts:
posts_still_left = False

# Get the quotes posts and add them as TumblrPost objects to the all_posts_list.
print "Fetching quote entries ..."
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/quote?api_key=[MY_KEY]'
offset = 0
posts_still_left = True
while posts_still_left:
request_url += "&offset=" + str(offset)
print "\tFetching quote entries (%i) ..." % offset
tumblr_response = requests.get(request_url).json()
total_posts = tumblr_response['response']['total_posts']
for post in tumblr_response['response']['posts']:
# See https://www.tumblr.com/docs/en/api/v2#quote-posts
p = TumblrPost(post['post_url'],
post['date'],
post['text'],
post['source'], '',
'quote') # No title for quote posts
all_posts.append(p)
offset += 20
if offset > total_posts:
posts_still_left = False

sorted_posts = sorted(all_posts,
key=lambda tpost: tpost.post_date,
reverse=True)

print "Creating HTML file ..."

# Read a file that contains the HTML layout of the posts,
# with placeholders for individual bits of data
html_file = open(folder_name + 'tumblr_block_template.html', 'r')
html_block_template = html_file.read()
html_file.close()

ctr = 0
for p in sorted_posts:
new_html_block = html_block_template
ctr += 1
new_html_block = new_html_block.replace('%%%posttitle%%%', p.post_title)
new_html_block = new_html_block.replace('%%%postdate%%%', p.post_date)
new_html_block = new_html_block.replace('%%%posttext%%%', p.post_text)
new_html_block = new_html_block.replace('%%%postsource%%%', p.post_source)
new_html_block = new_html_block.replace('%%%posturl%%%', p.post_url)
new_html_block = new_html_block.replace('%%%postcounter%%%', str(ctr))
new_html_block = new_html_block.replace('%%%posttype%%%', p.post_type)
html_output += new_html_block

# The template has a placeholder for the content that's generated dynamically
html_file = open(folder_name + 'tumblr_template.html', 'r')
html_file_contents = html_file.read()
html_file.close()
html_file_contents = html_file_contents.replace('%%%posts%%%', html_output)

# Open (i.e., create) a new file with the ability to write Unicode.
# See http://stackoverflow.com/questions/934160/write-to-utf-8-file-in-python
file_timestamp = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
with codecs.open(folder_name +
'tumbl_bu-' +
file_timestamp +
'.html', 'w', "utf-8-sig") \
as new_html_file:
new_html_file.write(html_file_contents)
new_html_file.close()

print 'Done!'

[categories]  

|


  09:17 PM

I wonder how many people do this. Let’s say I’m reading something on Wikipedia, and a paragraph includes a link that’s seductively drawing my attention away from the current article. In a show of resistance to ADHD, I won’t just click that link—instead, I’ll Ctrl+click it, thus opening the linked page in another tab “for later.”

After some amount of reading, I’ll have, oh, a dozen tabs open in the browser:


Or 20. Or 30. In another exhibit of discipline, I will occasionally drag all of these open tabs from the many and various browser windows I have open into a single browser window. Now, that’s organized.

Perhaps it’s the “for later” part that I’m wondering about. I just checked some of the pages in those tabs in the screenshot. As near as I can tell, the oldest one goes back about three months. Here’s a sampling of the pages I currently have open:
  • The Tumblr API reference
  • Three (!) articles on time perspective.
  • An article on how to use Twitter for business.
  • The article “Complements and Dummies” by John Lawler, a linguist.
  • An article on high-impact training in 4 minutes.
  • An article on how to create effective to-do lists.
  • An article on how to adjust the aim of the headlight on a motorcycle.
  • The syllabus, wiki, and video page for a Coursera course I’m taking.
  • A Wikipedia article about the 1952 steel strike (related to the previous).
You can see that these are all pages that I want to keep handy, ready to read when I get a few spare minutes.

My officemate and I were talking about this today, and it turns out he does something similar. My collection of open tabs has survived several computer reboots (thanks, Chrome!), and my officemate confirms that his collection has persisted through a number of upgrades to Firefox.

It seems like a logical approach would be to bookmark these pages, either in the browser, or using some sort of bookmarking site like Pinterest or (ha) Delicious. Or heck, OneNote or EverNote.

But in my case, tossing a link into any of these is almost the equivalent of throwing it into a black hole. Yes, I have the link, but I don’t make a habit of going back to my saved links and looking for things that had struck my fancy days or weeks or months ago.

No, the habit of keeping these pages open seems to act as a kind of short-term bookmarking. Now and then I might actually click on one of the tabs just to remind myself of why I have all these pages open. For the most part, any given page still looks interesting, so I don’t want to close it. After all, I still intend to read that page Real Soon Now.

[categories]  

[1] |


  12:14 PM

This is part 2. (See part 1.)

Previously on Playing with the Tumblr API:

“I have a Tumblr blog …”
“Tumblr’s search feature is virtually useless …”
“However, Tumblr does support a nice RESTful API …”
“I wanted to build an HTML page out of my posts …”


As I described last time, once you’ve registered an app and gotten a client ID and secret key, it’s very easy to call the Tumblr API to read posts:
http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=secret-key
This API returns up to 20 posts at a time. Each response includes information about how many total posts there match are that your request criteria. So to get all the posts, you make this request in a loop, adding an offset value that you increment each time, and stopping when you’ve hit the total number of posts. Here’s one way to do that in Python:
import json
 
request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=key'
offset = 0
posts_still_left = True
while posts_still_left:
    request_url += "&offset=" + str(offset)
    tumblr_response = requests.get(request_url).json()
    total_posts = tumblr_response['response']['total_posts']
    for post in tumblr_response['response']['posts']:
        # Do something with the JSON info here
    offset += 20
    if offset > total_posts:
        posts_still_left = False
I’m using the awesome requests library (motto: “HTTP for Humans”) to make the API requests. The response is in JSON. In raw Python, the return value is typed as requests.models.Response, but the json library makes it easy to convert that to a dict. You can then easily pluck out the values you want. Here, for example, I’m extracting the values of the total_posts field. Inside the response element there’s a posts array that contains the guts of each of the 20 posts that the response returns.

Normalizing  Post Info

I noted before that I was interested in 2 (text, quote) of the 8 types of posts that Tumblr supports, and that different post types return somewhat different info. The JSON info for Tumblr posts contains a lot of information—a lot of it is metadata like state (published, queued), note_count, tags, and some other stuff that, while essential to Tumblr’s purposes, did not interest me personally. I’m interested in just these things: post_url, date, title, and body (text posts) or source (quote posts).

To normalize this information, I fell back on old habits: I created a TumblrPost class in Python and defined members that accommodated all of the JSON values I was interested in across both post types:
class TumblrPost:
    def __init__(self, post_url, post_date, post_text, post_source, post_title):
        self.post_url = post_url
        self.post_date = post_date
        self.post_text = post_text
        self.post_source = post_source
        if post_title is None or post_title == '':
            self.post_title = ''
        else:
            self.post_title = post_title
Should I want at some point to accommodate additional types of posts, I can add members to this class. I guess.

Having this class lets me read the raw JSON in a loop and create an instance of the class for each Tumblr post I read. I can then just add the new instance to a Python list. My code to read text posts looks like the following:
all_posts = []      # List to hold instances of the TumblrPost class

request_url = 'http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=key'
offset = 0
posts_still_left = True
while posts_still_left:
    request_url += "&offset=" + str(offset)
    print "\tFetching text entries (%i) ..." % offset
    tumblr_response = requests.get(request_url).json()
    total_posts = tumblr_response['response']['total_posts']
    for post in tumblr_response['response']['posts']:
        p = TumblrPost(post['post_url'], post['date'], post['body'], '', post['title'])
        all_posts.append(p)
    offset += 20
    if offset > total_posts:
        posts_still_left = False

Reading both Text and Quote Posts

So that took care of reading text posts. As I say, quote posts have slightly different JSON layout, such that reading the JSON and instantiating a TumblrPost instance looks like this (no body, but a source):
p = TumblrPost(post['post_url'], post['date'], post['text'], post['source'], '')
I debated whether to try to tweak my loop logic try to accommodate both text-type and quote-type requests in the same loop. In that case, the loop has to a) issue a slightly different request (with /quote? instead of /text?) and then b) extract slightly different JSON when creating the TumblrClass instances. This would require a variable to track which type of post I was reading and then some if logic to branch to the appropriate request and appropriate instantiation logic. Bah. In the end, I just copied this loop (*gasp!*) and changed the couple of affected lines.

Next up: Creating the actual HTML, and then done, whew.

[categories]  

|


  10:43 PM

I have a Tumblr blog where I stash interesting (to me) quotes and citations that I've run across in my readings. Tumblr has some nice features, including a queue that lets you schedule a posting for "next Tuesday" or a date and time that you pick.


Tumblr Woes

However, Tumblr’s search feature is virtually useless, which I sorely miss when I want to find something I posted in the distant past. As near as I can tell, their search looks only for tags, and even then (again, AFAICT) it doesn't scope the search to just one (my) blog.

In theory, I can display the blog and use the browser's Ctrl+F search to look for words. Tumblr supports infinite scroll, but, arg, in such a way that Ctrl+F searches cannot find posts that I can plainly see right in the browser.

When search proved useless, I thought I might be able to download all my posts and then search them locally. However, and again AFAICT, Tumblr has no native support for exporting your posts. There once was a utility/website that someone had built that allowed you to export your blog, but it's disappeared.[1]

APIs to the Rescue

However, Tumblr does support a nice RESTful API. Since I've been poking around a bit with Python, it seemed like an interesting project to write a Python script to make up for these Tumblr deficiencies. I initially thought I'd write a search script, but I ended up writing a script to export my particular blog to an HTML file, which actually solves both of my frustrations—search and export/backup.

Like other companies, Tumblr requires you to register your application (e.g. "mike's Tumblr search") and in exchange they give you an OAuth key that consists of a "consumer key" and a "secret key." You use these keys (most of the time) to establish your bona fides to Tumblr when you make requests using the API.

(Side note: They basically have three levels of auth. Some APIs require no key; some require just the secret key; and some require that you use the Tumblr keys in conjunction with OAuth to get a temporary access key. This initially puzzled me, but it soon became clear that their authentication/authorization levels correspond with how public the information is that you want to work with. To get completely public info, like the blog avatar, requires no auth. In contrast, using the API to post to the blog or edit a post requires full-on OAuth.)

Tasks I Needed to Perform

The mechanics of what I wanted to do—namely, get all the posts—are trivially easy. For example, to get a list of posts of type "text" (more on that in a moment), you do this:

http://api.tumblr.com/v2/blog/mikepope.tumblr.com/posts/text?api_key=secret-key

This returns 20 posts' worth of information in a JSON block that's well documented, and which includes paging markers so that you can get the next 20 posts, etc. In a narrow sense, all I needed to do was to issue a request in a loop to get the blog posts page by page, concatenate everything together, and write it out. I’d then have a “backup”—or at least a copy, even if it was a big ol’ JSON mess—of my entries, and these would be somewhat searchable.

As it happens, you use different queries to get types of posts. Tumblr supports a half dozen types of posts—text, quote, link, answer, video, audio, photo, chat. Each type requires a separate query[2], and each returns a slightly different block of JSON. For just basic read-and-dump, it’s a matter of looping through again, but this time with a slightly different query.

So that’s the basics. As noted, I got this idea that I wanted to build an HTML page out of my posts, and that complicated things. But not too terribly much. (I’m using Python, after all, haha). More on that soon.

Update: Part 2 is now up.

[1] One of the original reasons I got interested in writing this blog, in fact, was that LiveJournal did not support any form of search way back in 2001.

[2] Their docs suggest that if type is left blank, they'll return everything, but that was not my experience.

[categories]  

|