`Kafka` is a distributed streaming platform which uses logs as the unit of
storage for messages passed within the system. It is horizontally scalable,
fault-tolerant, fast, and runs in production in thousands of companies.
Likely your business is already using it in some form.


.. _`Kafka`: https://kafka.apache.org/intro

What you must know about Apache Kafka to use Faust
--------------------------------------------------
**Topics**

    A topic is a stream name to which records are published.
    Topics in Kafka are always multi-subscriber; that is, a topic can have zero,
    one, or many consumers that subscribe to the data written to it. Topics are
    also the base abstraction of where data lives within Kafka. Each topic is
    backed by logs which are *partitioned* and distributed.

    Faust uses the abstraction of a topic to both consume data from a stream as
    well as publish data to more streams represented by Kafka topics.
    A Faust application needs to be consuming from at least one topic and may
    create many intermediate topics as a by-product of your streaming
    application. Every topic that is not created by Faust internally can be
    thought of as a source (for your application to process) or sink
    (for other systems to pick up).

**Partitions**

    Partitions are the fundamental unit within Kafka where data lives. Every
    topic is split into one or more partitions. These partitions are
    represented internally as logs and messages always make their
    way to one partition (log). Each partition is replicated and has one leader
    at any point in time.

    Faust uses the notion of a partition to maintain order amongst messages and
    as a way to split the processing of data to increase throughput. A Faust
    application uses the notion of a "key" to make sure that messages that
    should appear together and be processed on the same box do end up on the
    same box.

**Fault Tolerance**

    Every partition is replicated with the total copies represented by the
    In Sync Replicas (ISR). Every ISR is a candidate to take over as the new
    leader should the current leader fail. The maximum number of faulty Kafka
    brokers that can be tolerated is the number of ISR - 1. I.e., if every
    partition has three replicas, all fault tolerance guarantees hold as long
    as at least one replica is functional

    Faust has the same guarantees that Kafka offers with regards to fault
    tolerance of the data.

**Distribution of load/work**

    For every partition, all reads and writes are routed to the leader of that
    partition. For a specific topic, the load is as distributed as the number
    of partitions. *Note:* Since the partition is the lowest degree of parallel
    processing of messages, the number of partitions control how much many
    parallel instances of the consumers can operate on messages.

    Faust uses parallel consumers and therefore is also limited by the number
    of partitions to dictate how many concurrent Faust application instances can
    run to distribute work. Extra Faust application instances beyond the source
    topic partition count will be idle and not improve message processing rates.


**Offsets**
    For every <topic, partition> combination Kafka keeps track of the offset
    of messages in the log to know where new messages should be appended. On a
    consumer level, offsets are maintained at the <group, topic, partition>
    level for consumers to know where to continue consuming for a given
    "group". The group acts as a namespace for consumers to register when
    multiple consumers want to share the load on a single topic.

    Kafka maintains processing guarantees of at least once by committing offsets
    after message consumption. Once an offset has been committed at the consumer
    level, the message at that offset for the <group, topic, partition> will not
    be reread.

    Faust uses the notion of a group to maintain a namespace within an app.
    Faust commits offsets after when a message is processed through all of its
    operations. Faust allows a configurable *commit interval* which makes sure
    that all messages that have been processed completely since the last
    interval will be committed.

**Log Compaction**

    Log compaction is a methodology Kafka uses to make sure that as data for a key
    changes it will not affect the size of the log such that every state change
    is maintained for all time. Only the most recent value is guaranteed to be
    available. Periodic compaction removes all values for a key except the last
    one.

    Tables in Faust use log compaction to ensure table state can
    be recovered without a large space overhead.


This summary and information about Kafka is adapted from original documentation
on Kafka available at https://kafka.apache.org/
