Metadata-Version: 2.1
Name: fastkafka
Version: 0.1.0rc1
Summary: FastKafka is a powerful and easy-to-use Python library for building asynchronous web services that interact with Kafka topics. Built on top of FastAPI, Starlette, Pydantic, AIOKafka and AsyncAPI, FastKafka simplifies the process of writing producers and consumers for Kafka topics.
Home-page: https://github.com/airtai/fastkafka
Author: airt
Author-email: info@airt.ai
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: dev
License-File: LICENSE

FastKafka
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

<b>Effortless Kafka integration for your web services</b>

------------------------------------------------------------------------

![PyPI](https://img.shields.io/pypi/v/fastkafka.png) ![PyPI -
Downloads](https://img.shields.io/pypi/dm/fastkafka.png) ![PyPI - Python
Version](https://img.shields.io/pypi/pyversions/fastkafka.png)

![GitHub Workflow
Status](https://img.shields.io/github/actions/workflow/status/airtai/fastkafka/test.yaml)
![CodeQL](https://github.com/airtai/fastkafka//actions/workflows/codeql.yml/badge.svg)
![Dependency
Review](https://github.com/airtai/fastkafka//actions/workflows/dependency-review.yml/badge.svg)

![GitHub](https://img.shields.io/github/license/airtai/fastkafka.png)

------------------------------------------------------------------------

[FastKafka](fastkafka.airt.ai) is a powerful and easy-to-use Python
library for building asynchronous services that interact with Kafka
topics. Built on top of [Pydantic](https://docs.pydantic.dev/),
[AIOKafka](https://github.com/aio-libs/aiokafka) and
[AsyncAPI](https://www.asyncapi.com/), FastKafka simplifies the process
of writing producers and consumers for Kafka topics, handling all the
parsing, networking, task scheduling and data generation automatically.
With FastKafka, you can quickly prototype and develop high-performance
Kafka-based services with minimal code, making it an ideal choice for
developers looking to streamline their workflow and accelerate their
projects.

## Install

FastKafka works on macOS, Linux, and most Unix-style operating systems.
You can install it with `pip` as usual:

``` sh
pip install fastkafka
```

## Writing server code

Here is an example python script using FastKafka that takes data from a
Kafka topic, makes a prediction using a predictive model, and outputs
the prediction to another Kafka topic.

### Messages

FastKafka uses [Pydantic](https://docs.pydantic.dev/) to parse input
JSON-encoded data into Python objects, making it easy to work with
structured data in your Kafka-based applications. Pydantic’s
[`BaseModel`](https://docs.pydantic.dev/usage/models/) class allows you
to define messages using a declarative syntax, making it easy to specify
the fields and types of your messages.

This example defines two message classes for use in a FastKafka
application:

- The `InputData` class is used to represent input data for a predictive
  model. It has three fields: `user_id`, `feature_1`, and `feature_2`.
  The `user_id` field is of type
  [`NonNegativeInt`](https://docs.pydantic.dev/usage/types/#constrained-types),
  which is a subclass of int that only allows non-negative integers. The
  `feature_1` and `feature_2` fields are both lists of floating-point
  numbers and integers, respectively.

- The `Prediction` class is used to represent the output of the
  predictive model. It has two fields: `user_id` and `score`. The
  `score` field is a floating-point number and it represents the
  prediction made by the model, such as the probability of churn in the
  next 28 days.

These message classes will be used to parse and validate incoming data
in Kafka consumers and producers.

``` python
from typing import List
from pydantic import BaseModel, Field, NonNegativeInt

class InputData(BaseModel):
    user_id: NonNegativeInt = Field(..., example=202020, description="ID of a user")
    feature_1: List[float] = Field(
        ...,
        example=[1.2, 2.3, 4.5, 6.7, 0.1],
        description="input feature 1",
    )
    feature_2: List[int] = Field(
        ...,
        example=[2, 4, 3, 1, 0],
        description="input feature 2",
    )

class Prediction(BaseModel):
    user_id: NonNegativeInt = Field(..., example=202020, description="ID of a user")
    score: float = Field(
        ...,
        example=0.4321,
        description="Prediction score (e.g. the probability of churn in the next 28 days)",
        ge=0.0,
        le=1.0,
    )
```

These message classes will be used to parse and validate incoming data
in a Kafka consumer and to produce a JSON-encoded message in a producer.
Using Pydantic’s BaseModel in combination with FastKafka makes it easy
to work with structured data in your Kafka-based applications.

### Application

This example shows how to initialize a FastKafka application.

It starts by defining a dictionary called `kafka_brokers`, which
contains two entries: `"localhost"` and `"production"`, specifying local
development and production Kafka brokers. Each entry specifies the URL,
port, and other details of a Kafka broker. This dictionary is used for
generating the documentation only and it is not being checked by the
actual server.

Next, an object of the `FastAPI` class is created. It role is to serve
the documentation and to start and shutdown
[`FastKafka`](https://airtai.github.io/fastkafka/fastkafka.html#fastkafka).

Finally, an object of the
[`FastKafka`](https://airtai.github.io/fastkafka/fastkafka.html#fastkafka)
class is initialized with the minimum set of arguments:

- `app`: an `FastAPI` application used for serving the documentation and
  starting/shutting down the service

- `kafka_brokers`: a dictionary used for generation of documentation

- `bootstrap_servers`: a `host[:port]` string or list of `host[:port]`
  strings that a consumer or a producer should contact to bootstrap
  initial cluster metadata

``` python

from os import environ

from fastkafka.application import FastKafka

kafka_brokers = {
    "localhost": {
        "url": "localhost",
        "description": "local development kafka broker",
        "port": 9092,
    },
    "production": {
        "url": "kafka.airt.ai",
        "description": "production kafka broker",
        "port": 9092,
        "protocol": "kafka-secure",
        "security": {"type": "plain"},
    },
}

bootstrap_servers = f"{environ['KAFKA_HOSTNAME']}:{environ['KAFKA_PORT']}"

kafka_app = FastKafka(
    kafka_brokers=kafka_brokers,
    bootstrap_servers=bootstrap_servers,
)
```

### Function decorators

FastKafka provides convenient function decorators `@kafka_app.consumes`
and `@kafka_app.produces` to allow you to delegate the actual process of

- consuming and producing data to Kafka, and

- decoding and encoding JSON encode messages

from user defined functions to the framework. The FastKafka framework
delegates these jobs to AIOKafka and Pydantic libraries.

These decorators make it easy to specify the processing logic for your
Kafka consumers and producers, allowing you to focus on the core
business logic of your application without worrying about the underlying
Kafka integration.

This following example shows how to use the `@kafka_app.consumes` and
`@kafka_app.produces` decorators in a FastKafka application:

- The `@kafka_app.consumes` decorator is applied to the `on_input_data`
  function, which specifies that this function should be called whenever
  a message is received on the “input_data” Kafka topic. The
  `on_input_data` function takes a single argument which is expected to
  be an instance of the `InputData` message class. Specifying the type
  of the single argument is instructing the Pydantic to use
  `InputData.parse_raw()` on the consumed message before passing it to
  the user defined function `on_input_data`.

- The `@produces` decorator is applied to the `to_predictions` function,
  which specifies that this function should produce a message to the
  “predictions” Kafka topic whenever it is called. The `to_predictions`
  function takes two arguments: `user_id` and `score`. It creates a new
  `Prediction` message with these values and then returns it. The
  framework will call the `Prediction.json().encode("utf-8")` function
  on the returned value and produce it to the specified topic.

``` python

@kafka_app.consumes(topic="input_data", auto_offset_reset="latest", group_id="my_group")
async def on_input_data(msg: InputData):
    global model
    score = await model.predict(feature_1=msg.feature_1, feature_2=msg.feature_2)
    await to_predictions(user_id=msg.user_id, score=score)


@kafka_app.produces(topic="predictions")
async def to_predictions(user_id: int, score: float) -> Prediction:
    prediction = Prediction(user_id=user_id, score=score)
    return prediction


# this is a mock up for testing, should be replaced with the real model
class Model:
    async def predict(self, feature_1: List[int], feature_2: List[float]) -> float:
        return 0.87


model = Model()
```

## Running the service

The service can be started using builtin faskafka run CLI command

We will concatenate the code snippets from above and save them in a file
`"server.py"`

??? Example

    This example contains the content of the file "server.py":

    ```python

    from typing import List
    from pydantic import BaseModel, Field, NonNegativeInt

    class InputData(BaseModel):
        user_id: NonNegativeInt = Field(..., example=202020, description="ID of a user")
        feature_1: List[float] = Field(
            ...,
            example=[1.2, 2.3, 4.5, 6.7, 0.1],
            description="input feature 1",
        )
        feature_2: List[int] = Field(
            ...,
            example=[2, 4, 3, 1, 0],
            description="input feature 2",
        )

    class Prediction(BaseModel):
        user_id: NonNegativeInt = Field(..., example=202020, description="ID of a user")
        score: float = Field(
            ...,
            example=0.4321,
            description="Prediction score (e.g. the probability of churn in the next 28 days)",
            ge=0.0,
            le=1.0,
        )



    from os import environ

    from fastkafka.application import FastKafka

    kafka_brokers = {
        "localhost": {
            "url": "localhost",
            "description": "local development kafka broker",
            "port": 9092,
        },
        "production": {
            "url": "kafka.airt.ai",
            "description": "production kafka broker",
            "port": 9092,
            "protocol": "kafka-secure",
            "security": {"type": "plain"},
        },
    }

    bootstrap_servers = f"{environ['KAFKA_HOSTNAME']}:{environ['KAFKA_PORT']}"

    kafka_app = FastKafka(
        kafka_brokers=kafka_brokers,
        bootstrap_servers=bootstrap_servers,
    )



    @kafka_app.consumes(topic="input_data", auto_offset_reset="latest", group_id="my_group")
    async def on_input_data(msg: InputData):
        global model
        score = await model.predict(feature_1=msg.feature_1, feature_2=msg.feature_2)
        await to_predictions(user_id=msg.user_id, score=score)


    @kafka_app.produces(topic="predictions")
    async def to_predictions(user_id: int, score: float) -> Prediction:
        prediction = Prediction(user_id=user_id, score=score)
        return prediction


    # this is a mock up for testing, should be replaced with the real model
    class Model:
        async def predict(self, feature_1: List[int], feature_2: List[float]) -> float:
            return 0.87


    model = Model()


    ```

Then, we start the FaskKafka servie by running the following command in
the folder where the server.py file is located:

``` shell
fastkafka run --num-workers=1 server:kafka_app
```

After running the command, you should see an output like the one below:

    [5047]: [INFO] fastkafka.application: _create_producer() : created producer using the config: '{'bootstrap_servers': 'tvrtko-fastkafka-kafka-1:9092'}'
    [5047]: [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop() starting...
    [5047]: [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop(): Consumer created using the following parameters: {'bootstrap_servers': 'tvrtko-fastkafka-kafka-1:9092', 'auto_offset_reset': 'latest', 'max_poll_records': 100, 'group_id': 'my_group'}
    [5047]: [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop(): Consumer started.
    [5047]: [INFO] aiokafka.consumer.subscription_state: Updating subscribed topics to: frozenset({'input_data'})
    [5047]: [INFO] aiokafka.consumer.consumer: Subscribed to topic(s): {'input_data'}
    [5047]: [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop(): Consumer subscribed.
    [5047]: [INFO] aiokafka.consumer.group_coordinator: Discovered coordinator 1001 for group my_group
    [5047]: [INFO] aiokafka.consumer.group_coordinator: Revoking previously assigned partitions set() for group my_group
    [5047]: [INFO] aiokafka.consumer.group_coordinator: (Re-)joining group my_group
    [5047]: [INFO] aiokafka.consumer.group_coordinator: Joined group 'my_group' (generation 3) with member_id aiokafka-0.8.0-4638ef56-bd7d-4581-95ad-42c90322442d
    [5047]: [INFO] aiokafka.consumer.group_coordinator: Elected group leader -- performing partition assignments using roundrobin
    [5047]: [INFO] aiokafka.consumer.group_coordinator: Successfully synced group my_group with generation 3
    [5047]: [INFO] aiokafka.consumer.group_coordinator: Setting newly assigned partitions {TopicPartition(topic='input_data', partition=0)} for group my_group
    Starting process cleanup, this may take a few seconds...
    [INFO] fastkafka.server: terminate_asyncio_process(): Terminating the process 5047...
    [5047]: [INFO] aiokafka.consumer.group_coordinator: LeaveGroup request succeeded
    [5047]: [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop(): Consumer stopped.
    [5047]: [INFO] fastkafka._components.aiokafka_consumer_loop: aiokafka_consumer_loop() finished.
    [INFO] fastkafka.server: terminate_asyncio_process(): Process 5047 terminated.

When the service is started, several log messages are printed to the
console, including information about the application startup, AsyncAPI
specification generation, and consumer loop status.

During the lifetime of the service, incoming requests will be processed
by the FastKafka application and appropriate actions will be taken based
on the defined Kafka consumers and producers. For example, if a message
is received on the “input_data” Kafka topic, the `on_input_data`
function will be called to process the message, and if the
`to_predictions` function is called, it will produce a message to the
“predictions” Kafka topic. The service will continue to run until it is
shut down, at which point the application shutdown process will be
initiated and the service will stop.

### Checking out the documentation

To generate and serve the documentation **locally**, you can use the
built in kafka function that will do all the work for you. In the folder
where the server.py file is located, run the following command:

``` shell
fastkafka docs serve server:kafka_app
```

After running the command you should see the following output:

``` python
exit_code, output = await run_script_and_cancel(
    script=script,
    script_file="server.py",
    cmd=cmd,
    cancel_after=45,
)
print(output.decode("utf-8"))
assert exit_code == 0, exit_code
print(output.decode("utf-8"))
```

    {'localhost': {'description': 'local development kafka broker',
                   'port': 9092,
                   'url': 'localhost'},
     'production': {'description': 'production kafka broker',
                    'port': 9092,
                    'protocol': 'kafka-secure',
                    'security': {'type': 'plain'},
                    'url': 'kafka.airt.ai'}}

The generated documentation is as follows:

![Kafka_servers](images/screenshot-kafka-servers.png)

Next, you can see the documentation generated from the `@consumes`
decorator when used on the function `on_input_data` with a single
parameter of type `InputData`:

``` python
class InputData(BaseModel):
    user_id: NonNegativeInt = Field(..., example=202020, description="ID of a user")
    feature_1: List[float] = Field(
        ...,
        example=[1.2, 2.3, 4.5, 6.7, 0.1],
        description="input feature 1",
    )
    feature_2: List[int] = Field(
        ...,
        example=[2, 4, 3, 1, 0],
        description="input feature 2",
    )
```

``` python
@kafka_app.consumes(topic="input_data", auto_offset_reset="latest", group_id="my_group")
async def on_input_data(msg: InputData):
    global model
    score = await model.predict(feature_1=msg.feature_1, feature_2=msg.feature_2)
    await to_predictions(user_id=msg.user_id, score=score)
```

The resulting documentation is generated as follows:

![Kafka_consumer](images/screenshot-kafka-consumer.png)

## Testing the service

``` python
from os import environ

import anyio
import asyncer
from tqdm.notebook import tqdm, trange

from fastkafka.helpers import (
    consumes_messages,
    create_missing_topics,
    create_admin_client,
    produce_messages,
    wait_for_get_url,
)

from fastkafka.testing import LocalKafkaBroker
```

``` python
num_workers = 4
listener_port = 9913
msgs = [
    dict(user_id=i, feature_1=[(i / 1_000) ** 2], feature_2=[i % 177])
    for i in trange(5_000, desc="generating messages")
]

bootstrap_servers = f"{environ['KAFKA_HOSTNAME']}:{environ['KAFKA_PORT']}"

async with LocalKafkaBroker(
    zookeeper_port=9911, listener_port=listener_port
) as bootstrap_servers:
    create_missing_topics(
        ["input_data", "predictions"],
        bootstrap_servers=bootstrap_servers,
        num_partitions=4,
    )

    async with asyncer.create_task_group() as tg:
        output = tg.soonify(run_script_and_cancel)(
            script=script.replace(
                "bootstrap_servers = f\"{environ['KAFKA_HOSTNAME']}:{environ['KAFKA_PORT']}\"",
                f'bootstrap_servers = "localhost:{listener_port}"',
            ),
            script_file=f"server.py",
            cmd=f"fastkafka run --num-workers={num_workers} server:kafka_app",
            cancel_after=120,
        )

        tg.soonify(consumes_messages)(
            msgs_count=len(msgs),
            topic="predictions",
            bootstrap_servers=bootstrap_servers,
        )

        await anyio.sleep(10)

        tg.soonify(produce_messages)(
            msgs=msgs, topic="input_data", bootstrap_servers=bootstrap_servers
        )

print(output.value[1].decode("UTF-8"))
```

    generating messages:   0%|          | 0/20000 [00:00<?, ?it/s]

    [INFO] fastkafka.testing: Java is already installed.
    [INFO] fastkafka.testing: But not exported to PATH, exporting...
    [INFO] fastkafka.testing: Kafka is already installed.
    [INFO] fastkafka.testing: But not exported to PATH, exporting...
    [INFO] fastkafka.helpers: create_missing_topics(['input_data', 'predictions']): new_topics = [NewTopic(topic=predictions,num_partitions=4)]
    [INFO] aiokafka.consumer.subscription_state: Updating subscribed topics to: frozenset({'predictions'})
    [INFO] aiokafka.consumer.group_coordinator: Metadata for topic has changed from {} to {'predictions': 4}. 

    consuming from 'predictions':   0%|          | 0/20000 [00:00<?, ?it/s]

    producing to 'input_data':   0%|          | 0/20000 [00:00<?, ?it/s]

    [INFO] fastkafka.components._subprocess: terminate_asyncio_process(): Terminating the process 5435...
    [INFO] fastkafka.components._subprocess: terminate_asyncio_process(): Process 5435 was already terminated.
    [INFO] fastkafka.components._subprocess: terminate_asyncio_process(): Terminating the process 5052...
    [INFO] fastkafka.components._subprocess: terminate_asyncio_process(): Process 5052 was already terminated.

    PendingValueException: The return value of this task is still pending. Maybe you forgot to access it after the async with asyncer.create_task_group() block. If you need to access values of async tasks inside the same task group, you probably need a different approach, for example with AnyIO Streams.


