Be Genius

me

Bo Jeanes

I am an software engineer who has lived and worked in New York, Brisbane, Chicago, San Francisco, and now Melbourne. I primarily work in Ruby though am a functional programmer at heart and a fan of programming languages in general. In particular, I love Rust and Clojure but keep my eye on many langauges all the time.

Unescaping UTF-8 Strings in Ruby 1.9

Today Radar and I encountered a little issue with String encodings in Ruby 1.9. In this project, some Merb UTF-8 params needed to be unescaped. We spent some trying to force strings into UTF-8 encoding but for some reason while the encoding was UTF–8, the actual contents of the strings were getting massacred.

Long story short:

# In Ruby 1.9

# Broken
puts URI::unescape # => "Baden-Württemberg"

# Working
puts CGI::unescape # => "Baden-Württemberg"

Another lesson we learnt on our adventures today is the following:

m1 = "Munich"
puts = m1.encoding  # let's suppose it is something other than UTF-8, such as ASCII-8Bit

m2 = "München"
puts = m2.encoding  # let's suppose we have a UTF-8 encoded string


# The following will raise an exception about mismatched encodings
m3 = m1 << m2

# The easiest way to get around this is to do the concatenation like so:
m3 = m3 = m1 << m2.force_encoding

Comments

None yet

You need to login with GitHub in order to comment.