官网地址:http://kafka.apache.org/090/documentation.html#messages
Messages consist of a fixed-size header and variable length opaque byte array payload. The header contains a format version and a CRC32 checksum to detect corruption or truncation. Leaving the payload opaque is the right decision: there is a great deal of progress being made on serialization libraries right now, and any particular choice is unlikely to be right for all uses. Needless to say a particular application using Kafka would likely mandate a particular serialization type as part of its usage. The MessageSet interface is simply an iterator over messages with specialized methods for bulk reading and writing to an NIO Channel.
/** * A message. The format of an N byte message is the following: * * If magic byte is 0 * * 1. 1 byte "magic" identifier to allow format changes * * 2. 4 byte CRC32 of the payload * * 3. N - 5 byte payload * * If magic byte is 1 * * 1. 1 byte "magic" identifier to allow format changes * * 2. 1 byte "attributes" identifier to allow annotations on the message independent of the version (e.g. compression enabled, type of codec used) * * 3. 4 byte CRC32 of the payload * * 4. N - 6 byte payload * */官网地址:http://kafka.apache.org/0100/documentation.html#messages
/** * 1. 4 byte CRC32 of the message * 2. 1 byte "magic" identifier to allow format changes, value is 0 or 1 * 3. 1 byte "attributes" identifier to allow annotations on the message independent of the version * bit 0 ~ 2 : Compression codec. * 0 : no compression * 1 : gzip * 2 : snappy * 3 : lz4 * bit 3 : Timestamp type * 0 : create time * 1 : log append time * bit 4 ~ 7 : reserved * 4. (Optional) 8 byte timestamp only if "magic" identifier is greater than 0 * 5. 4 byte key length, containing length K * 6. K byte key * 7. 4 byte payload length, containing length V * 8. V byte payload */官网地址:http://kafka.apache.org/0110/documentation.html#messageformat
Messages (aka Records) are always written in batches. The technical term for a batch of messages is a record batch, and a record batch contains one or more records. In the degenerate case, we could have a record batch containing a single record. Record batches and records have their own headers. The format of each is described below for Kafka version 0.11.0 and later (message format version v2, or magic=2). Click here for details about message formats 0 and 1.
baseOffset: int64 batchLength: int32 partitionLeaderEpoch: int32 magic: int8 (current magic value is 2) crc: int32 attributes: int16 bit 0~2: 0: no compression 1: gzip 2: snappy 3: lz4 bit 3: timestampType bit 4: isTransactional (0 means not transactional) bit 5: isControlBatch (0 means not a control batch) bit 6~15: unused lastOffsetDelta: int32 firstTimestamp: int64 maxTimestamp: int64 producerId: int64 producerEpoch: int16 baseSequence: int32 records: [Record]