Father's Day Tokenizer

2 min read

holidayHappy Father's Day.

javaCerty is offering various free online Java certifications.

javaI've been trying to make use of the PowerfulTokenizer I mentioned a few days ago. The author had the right idea, but his implementation is incomplete and riddled with bugs.

I started fixing it, but quickly realized it would be faster to roll my own.

Considering our particular needs it made more sense to implement it as part of the jTalk Utility class. But the gist of it goes something like this:

// The delimited string
String csv = ",2,3,"4, 4.1",5,,7,""5, 5.1""";
// The delimiter (i.e.: comma, tab, etc.)
String delim = ",";

// Tokenize the data
StringTokenizer st = new StringTokenizer(csv, delim, true);

// The buffer used to hold substrings
StringBuffer buffer = new StringBuffer(0);

// The current token
String token;
// The previous token
String oldToken = delim;

// Loop thru the tokens
while (st.hasMoreTokens())
{
  // Get the current token
  token = st.nextToken();

  // If the current token and old token are both equal to the
  // delimiter the current value is blank

  if (token.equals(delim))
  {
    if (oldToken.equals(delim))
    {
      // Output a blank value
      System.out.println("");
    }
  }
  else
  {
    // A substring always starts with a double quote
    if (token.charAt(0) == '"')
    {
      // Reset the buffer
      buffer.setLength(0);
      // Get the current substring chunk skipping the starting
      // double quote

      buffer.append(token.substring(1));
      
      // Loop until the end of the substring
      while (!(token.endsWith(""")))
      {
        // Get the current substring chunk
        token = st.nextToken();
        // Append it to the buffer
        buffer.append(token);
      }
      
      // Output the buffer if not empty
      if (buffer.length() > 0)
      {
        // Output the buffer skipping the ending double quote
        System.out.println(
          buffer.substring(0, buffer.length() - 1)));
      }
    }
    else
    {
      // Output the trimmed token
      System.out.println(token.trim());
    }
  }

  // Save the value of the token
  oldToken = token;
}

You may also want to implement a method which automatically converts pairs of double quotes ("") to single double quotes (") which is a common practice in most export formats.

usWe're going to give the Smokey Joe a try in the afternoon. Should be fun.

javaVersion 1.4.1 of Jakarta Cactus, a server-side unit testing framework, has been released.