About

I'm Mike Pope. I live in the Seattle area. I've been a technical writer and editor for over 30 years. I'm interested in software, language, music, movies, books, motorcycles, travel, and ... well, lots of stuff.

Read more ...

Blog Search


(Supports AND)

Google Ads

Feed

Subscribe to the RSS feed for this blog.

See this post for info on full versus truncated feeds.

Quote

We have a problem with human nature.

— Robert Shiller, economist, about bubbles



Navigation





<October 2014>
SMTWTFS
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678

Categories

  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  
  RSS  

Contact

Email me

Blog Statistics

Dates
First entry - 6/27/2003
Most recent entry - 10/16/2014

Totals
Posts - 2312
Comments - 2502
Hits - 1,675,703

Averages
Entries/day - 0.56
Comments/entry - 1.08
Hits/day - 405

Updated every 30 minutes. Last: 8:49 PM Pacific


  12:02 AM

Having finished up a pile of work-work, I can now return to the interesting suggestions raised by my recent word frequency entry. Simon suggested using a custom class that implements IComparable. That was new to me, so I gave that a try. It wasn't immediately obvious to me what to do, but with a little poking around, I found a number of examples, including in the .NET QuickStarts, who knew?

To jump ahead a moment, you can see the result of the effort first. I now have two word frequency pages, the original that uses a DataTable and a new one that uses the custom class and array sorting. Try them out:

Word frequency with data table
Word frequency with custom class implementing IComparable

The version with a data table is no longer interesting from an implementation POV, but I was curious about timings.[1]

The custom class is a simple class with a textbook implementation of CompareTo. The mildly fun twist is that I coded the CompareTo method to sort by two values (first frequency, descending, then word, ascending).

Eric had some other suggestions. One was to use a HashTable, which I couldn't figure out how to do that; putting instances of the custom class into a HashTable worked, but HashTable does not seem to support the Sort method. He's also got an implementation using generics, which is new in Whidbey, and thus not possible yet in 1.1. If you're curious, though, have a look at his second comment.

Anyway, here's the code:
Sub Button1_Click(sender As Object, e As EventArgs)


Dim startTime As DateTime = DateTime.Now
Dim endTime As DateTime


Dim i As Integer
Dim s As String


Dim punctuation() As Char = {".", ",", "!", "=", "-", _
", "_", ";", ":", "(", ")", "[", "]", """", "?", "/", "\", _
"@", "#", "$", "%", "&", "*", "=", "<", ">", "|", _
"~", "‘", "`"}
Dim t As String = TextBox1.Text
t = t.ToLower()
t = t.Trim()
For i = 0 to punctuation.Length - 1
t = t.Replace(punctuation(i), " ")
Next i
t = t.Replace(vbcrlf, " ")
t = t.Replace(vbtab, " ")


' Dumb old so-called smart quotes, grrr
t = t.Replace(Chr(145), " ")
t = t.Replace(Chr(146), "'") ' smart apostrophe
t = t.Replace(Chr(147), " ")
t = t.Replace(Chr(148), " ")
t = t.Replace(Chr(151), " ")


t = t.Replace(vbcrlf, " ")
t = t.Replace(vbtab, " ")


While t.indexOf(" ") > -1
t = t.Replace(" ", " ") ' double spaces
End While


' Create array of all words
Dim wordArray() As String
wordArray = t.split
Array.Sort(wordArray)


Dim WordsByCount As New ArrayList()
' Walk through word array, accumulating count of (sorted)
' words. When we run out of words, write word and accumulator
' to new array of custom WordFrequency objects.
Dim arrayLength AS Integer = wordArray.Length - 1
Dim accumulator As Integer = 0
Dim nextWord As String = ""
Dim currentWord As String = ""


For i = 0 to arrayLength
nextWord = wordArray(i)
If nextWord = currentWord Then
accumulator += 1
Else
If i > 0 Then
WordsByCount.Add(New WordFrequency(currentWord, accumulator))
End If
currentWord = nextWord
accumulator = 1
End If
Next
WordsByCount.Add(New WordFrequency(currentWord, accumulator))


' Sort method invokes custom comparison method of objects in array
WordsByCount.Sort()


' Display results
s = "<table cellpadding=4>"
For Each wf As WordFrequency in WordsByCount
s &= "<tr>"
s &= "<td>" & wf.Frequency & "</td>"
s &= "<td>" & wf.Word & "</td>"
s &= "</" & "tr>"
Next
s &= "</table>" Literal1.Text = s
labelWordCount.Text = wordarray.length
endTime = DateTime.Now
Dim timeDiff As TimeSpan = endTime.Subtract(startTime)
Dim totalSeconds As Double = (timeDiff.TotalMilliSeconds / 1000)
labelTime.text = totalSeconds.ToString("g")
End Sub


Class WordFrequency: Implements IComparable
Dim WordValue As String
Dim FrequencyValue As Integer


Public Sub New()
End Sub


Public Sub New(word As String, freq As Integer)
Me.Word = word
Me.Frequency = freq
End Sub


Public Property Word As String
Get
Return WordValue
End Get
Set (value As String)
WordValue = value
End Set
End Property


Public Property Frequency As Integer
Get
Return FrequencyValue
End Get


Set (value As Integer)
FrequencyValue = value
End Set
End Property


Public Function CompareTo (ByVal ObjectToCompare as Object) As Integer _
Implements IComparable.CompareTo
Dim WordFrequencyObject As WordFrequency = _
CType(ObjectToCompare, WordFrequency)
CompareTo = WordFrequencyObject.Frequency - Me.Frequency
If CompareTo = 0 Then
' Word frequencies are the same, so now compare words
If WordFrequencyObject.Word < Me.Word Then
CompareTo = 1
ElseIf WordFrequencyObject.Word > Me.Word
CompareTo = -1
ElseIf WordFrequencyObject.Word > Me.Word
CompareTo = 0
End If
End If
End Function
End Class

[1] The data table implementation seems to be marginally faster than the custom class, at least, as implemented by me. I used a test of 10,996 words (the first two chapters of Dickens's David Copperfield), and in three trials got these timings: Datatable - 7.23/6.0156/6.1093; Custom class - 8.1098/6.578/6.48.

[categories]  

[4] |