среда, 1 августа 2012 г.

io: char-streams

Терминология:


    Character set 
    A set of characters, i.e., symbols with specific semantic meanings. The letter "A" is a character. So is "%". Neither has any intrinsic numeric value, nor any direct relationship to ASCII, Unicode, or even computers. Both symbols existed long before the first computer was invented.

    Coded character set
    A assignment of numeric values to a set of characters. Assigning codes to characters so they can be represented digitally results in a specific set of character codings. Other coded character sets might assign a different numeric value to the same character. Character set mappings are usually determined by standards bodies, such as USASCII, ISO 8859-1, Unicode (ISO 10646-1), and JIS X0201.

    Character-encoding scheme
    A mapping of the members of a coded character set to a sequence of octets (eight bit bytes). The encoding scheme defines how a sequence of character encodings will be represented as a sequence of bytes. The numeric values of the character encodings do not need to be the same as the encoded bytes, nor even a one-to-one or one-to-many relationship. Think of character set encoding and decoding as similar in principle to object serialization and deserialization.


    Charset
    The term charset is defined in RFC2278 (http://ietf.org/rfc/rfc2278.txt). It's the combination of a coded character set and a character-encoding scheme. The anchor class of the java.nio.charset package is Charset, which encapsulates the charset abstraction.

    java.nio.charset.CharsetEncoder

    A CharsetEncoder object is a stateful transformation engine: characters go in and bytes come out. Several calls to the encoder may be required to complete a transformation. The encoder remembers the state of the transformation between calls.



    An encoding algorithm may choose to span byte boundaries when encoding characters, or some characters may encode into more bytes than others (UTF-8 works in this way).


    java.nio.charset.CharsetEncoder

    The CharsetEncoder class An encoding engine that converts a sequence of characters into a sequence of bytes. The byte sequence can later be decoded to reconstitute the original character sequence.

    java.nio.charset.CharsetDecoder
    The CharsetDecoder class A decoding engine that converts an encoded byte sequence into a sequence of characters.


    Источники
1. skipy.ru: "Вавилонское столпотворение. Часть 1. Кодировки"
2. javadoc: java.nio.charset.StandardCharsets
3. javadoc: java.nio.charset.Charset
4. javadoc: java.lang.Character
5. javadoc: java.lang.String
6. Ron Hitchens "Java NIO", chapter 6 "Character Sets"
7. http://docs.oracle.com/javase/tutorial/i18n/text/unicode.html
8. http://docs.oracle.com/javaee/5/tutorial/doc/bncno.html
9. http://docs.oracle.com/javaee/5/tutorial/doc/bnayb.html