BC's Strings.fromUTF8ByteArray

Discussion:

Dominik Schuermann

2014-09-15 07:15:35 UTC

Hi list,

I would like to know if there is any difference between using Bouncy
Castle's function:
Strings.fromUTF8ByteArray(input)

and the Java built in function:
CharsetDecoder cs = Charset.forName("UTF-8").newDecoder();
cs.decode(ByteBuffer.wrap(input));

in terms of output/parsing behavior.

In our project OpenKeychain [0] we sometimes have to deal with OpenPGP
user ids with bad encodings. The Java built-in functions give me much
more feedback on what went wrong, so I would like to switch to them
for converting raw user ids from byte arrays to Strings. Are there
arguments against this, why we should stay with BC's
Strings.fromUTF8ByteArray(input) ?

Regards
Dominik

[0] http://www.openkeychain.org/

David Hook

2014-09-17 05:35:35 UTC

Permalink

In our case the main reason for it is that not all JVMs we support have
what you've described below.

With OpenPGP the only thing I'd be careful of is a few people have
reported issues with assuming UTF8, so we have added a getRawUserIDs()
which returns them as byte arrays. You might want to be cautious about
assuming UTF8 (yes, I know that's what the RFC says even as far back as
1998, but...).

Regards,

David

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi list,
I would like to know if there is any difference between using Bouncy
Strings.fromUTF8ByteArray(input)
CharsetDecoder cs = Charset.forName("UTF-8").newDecoder();
cs.decode(ByteBuffer.wrap(input));
in terms of output/parsing behavior.
In our project OpenKeychain [0] we sometimes have to deal with OpenPGP
user ids with bad encodings. The Java built-in functions give me much
more feedback on what went wrong, so I would like to switch to them
for converting raw user ids from byte arrays to Strings. Are there
arguments against this, why we should stay with BC's
Strings.fromUTF8ByteArray(input) ?
Regards
Dominik
[0] http://www.openkeychain.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAEBAgAGBQJUFpIXAAoJEHGMBwEAASKCFo8IAIW3vgnd4e/CorINJEYinvuD
5U6frBiR17o8oHWm5nn7IKifqpe6WFFSCyv/E5NGXZLGSv7SsS26iZnMQ6qD7wHc
RV2FTXT+OKnKS+AXHumQgr3YfL8zKgdenbI3QGIV626MxZs/hhJGXP2vPweRMLid
EBuRZoMVTBInIMHLcjUA3CjgNi24maDanpn8sXMuwLS8fJAIy1Wxakol9iU8B/yi
G55dx5wefPyfv+/VhC/BaNYVe4hiSBh36t67r9TlTvF/phr5QNsdTIizTUYMupui
kc86stBjR5mjjqfkDiQKoHhrzWFgk+DKr+HIMjTSFVIjs2yCK2jTCZzBfVIdiHI=
=jZ0s
-----END PGP SIGNATURE-----

Dominik Schuermann

2014-09-17 12:43:46 UTC

Permalink

Hi David,

thanks for your answer.

Post by David Hook
In our case the main reason for it is that not all JVMs we support
have what you've described below.

I already assumed that this is the reason for the method, thanks for
confirming that.

Post by David Hook
With OpenPGP the only thing I'd be careful of is a few people have
reported issues with assuming UTF8, so we have added a
getRawUserIDs() which returns them as byte arrays. You might want
to be cautious about assuming UTF8 (yes, I know that's what the RFC
says even as far back as 1998, but...).

Yes, we encountered that and switched to raw user ids for getting
signatures. For displaying purpuoses, we now use Java's method because
it is able to replace undecodable characters properly with the
"unicode question mark" and skip them. BC's Strings.fromUTF8ByteArray
was handling broken encodings badly.

Regards
Dominik