Unicode and Code Points UTF-8 and Encodings Charlist referencies
You can use ?
before a character literal to see your code point
iex > ?a
97
iex > ?ł
322
iex > " \u 0061" === "a"
true
iex > 0x0061 = 97 = ?a
97
Elixir uses UTF-8 to encode its strings, which means that code are encoded as a series of 8-bit bytes
String.length/1
count graphemes, but byte_size/1
reveals the number of underlying raw bytes needed to store the string when using UTF-8. UTF-8 requires one byte to represent the characters h
, e
, and o
, but two bytes to represent ł
iex > string = "hełło"
iex > String . length (string)
5
iex > byte_size (string)
7
A charlist is a list of integers where all the integers are valid code points
iex > 'hełło'
[ 104 , 101 , 322 , 322 , 111 ]
iex > is_list 'hełło'
true
iex > 'hello'
'hello'
iex > List . first ( 'hello' )
104
iex > heartbeats_per_minute = [ 99 , 97 , 116 ]
'cat'
iex > to_charlist "hełło"
[ 104 , 101 , 322 , 322 , 111 ]
iex > to_string 'hełło'
"hełło"
iex > to_string :hello
"hello"
iex > to_string 1
"1"
iex > 'this ' <> 'fails'
** ( ArgumentError ) expected binary argument in <> operator but got: 'this '
(elixir) lib / kernel. ex:1821 : Kernel . wrap_concatenation / 3
(elixir) lib / kernel. ex:1808 : Kernel . extract_concatenations / 2
(elixir) expanding macro: Kernel . <>/ 2
iex:1 : (file)
iex > 'this ' ++ 'works'
'this works'
iex > "he" ++ "llo"
** ( ArgumentError ) argument error
:erlang . ++ ( "he" , "llo" )
iex > "he" <> "llo"
"hello"
Binaries, strings, and charlists: https://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html [archive ]