Unicode and Code Points

You can use ? before a character literal to see your code point

iex> ?a
97
iex>
322
iex> "\u0061" === "a"
true
iex>0x0061 = 97 = ?a
97

UTF-8 and Encodings

Elixir uses UTF-8 to encode its strings, which means that code are encoded as a series of 8-bit bytes

String.length/1 count graphemes, but byte_size/1 reveals the number of underlying raw bytes needed to store the string when using UTF-8. UTF-8 requires one byte to represent the characters h, e, and o, but two bytes to represent ł

iex> string = "hełło"
iex> String.length(string)
5
iex> byte_size(string)
7

Charlist

A charlist is a list of integers where all the integers are valid code points

iex> 'hełło'
[104, 101, 322, 322, 111]
iex> is_list 'hełło'
true
iex> 'hello'
'hello'
iex> List.first('hello')
104
iex> heartbeats_per_minute = [99, 97, 116]
'cat'
iex> to_charlist "hełło"
[104, 101, 322, 322, 111]
iex> to_string 'hełło'
"hełło"
iex> to_string :hello
"hello"
iex> to_string 1
"1"
iex> 'this ' <> 'fails'
** (ArgumentError) expected binary argument in <> operator but got: 'this '
    (elixir) lib/kernel.ex:1821: Kernel.wrap_concatenation/3
    (elixir) lib/kernel.ex:1808: Kernel.extract_concatenations/2
    (elixir) expanding macro: Kernel.<>/2
    iex:1: (file)
iex> 'this ' ++ 'works'
'this works'
iex> "he" ++ "llo"
** (ArgumentError) argument error
    :erlang.++("he", "llo")
iex> "he" <> "llo"
"hello"

referencies

Binaries, strings, and charlists: https://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html [archive]