Metadata-Version: 2.1
Name: datacycle
Version: 0.0.2
Summary: General toolset to backup & restore with random/filtered/anonymized data (Mongo/Postgres/GCS).
Home-page: https://github.com/smood/recycle
License: MIT
Requires-Python: >=3.8,<3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Provides-Extra: all
Provides-Extra: google
Provides-Extra: mongo
Provides-Extra: postgres
Requires-Dist: Faker (>=13.3.4,<14.0.0)
Requires-Dist: SQLAlchemy (>=1.4.13,<2.0.0); extra == "all" or extra == "postgres"
Requires-Dist: dataconf (>=1.4.0,<2.0.0)
Requires-Dist: furl (>=2.1.2,<3.0.0)
Requires-Dist: gsutil (>=5.9,<6.0); extra == "all" or extra == "google"
Requires-Dist: pg8000 (>=1.19.4,<2.0.0); extra == "all" or extra == "postgres"
Requires-Dist: pymongo (>=4.1.0,<5.0.0); extra == "all" or extra == "mongo"
Requires-Dist: sqlparse (>=0.4.1,<0.5.0)
Requires-Dist: tdqm (>=0.0.1,<0.0.2)
Requires-Dist: typer (>=0.4.1,<0.5.0)
Project-URL: Repository, https://github.com/smood/recycle
Description-Content-Type: text/markdown

# Datacycle

## Getting started

```
cp .env.example .env
vim .env
source .env

poetry install --extras all
poetry run datacycle
```

```
docker build -f Dockerfile -t datacycle .
docker run -it --rm --env-file .env datacycle
```

### Mac requirements

```
brew install mongodb/brew/mongodb-database-tools
brew install libpq
brew link --force libpq
npm install elasticdump -g
```

### Linux requirements

```
apt install -y mongo-tools
apt install -y postgresql-client
npm install elasticdump -g
```

## How to

```
datacycle --help
datacycle doctor

datacycle mongo "mongodb://user:password@localhost:27017/test1?authSource=admin" "mongodb://user:password@localhost:27017/test2?authSource=admin" --transform "
    transforms {
        test1 {
            before-transform {}
        }
    }
"

datacycle mongo mongodb://user:password@localhost:27017/test1?authSource=admin gs://datacycle-test/test1/snapshot --transform ops.hocon

datacycle mongo mongodb://user:password@localhost:27017/test1?authSource=admin mongodb://user:password@localhost:27017/test2?authSource=admin
datacycle mongo mongodb://user:password@localhost:27017/test1?authSource=admin gs://datacycle-test/test1/snapshot
datacycle mongo mongodb://user:password@localhost:27017/test1?authSource=admin test1/snapshot

datacycle mongo gs://datacycle-test/test1/snapshot mongodb://user:password@localhost:27017/test2?authSource=admin
datacycle mongo gs://datacycle-test/test1/snapshot gs://datacycle-test/test2/snapshot
datacycle mongo gs://datacycle-test/test1/snapshot test2/snapshot

datacycle mongo test1/snapshot mongodb://user:password@localhost:27017/test2?authSource=admin
datacycle mongo test1/snapshot gs://datacycle-test/test2/snapshot
datacycle mongo test1/snapshot test2/snapshot
```

## Providers

### Postgres

https://www.postgresql.org/docs/9.1/backup.html

- SQL dump
- file system snapshot
- continuous archiving

```
pg_dump --clean "postgres://user:password@localhost:5432/test" | gzip > dump.gz
gunzip -c dump.gz | psql "postgres://user:password@localhost:5432/test"
```

### Mongo

https://docs.mongodb.com/manual/core/backups/

- BSON dump
- file system snapshot
- CDC

```
mongodump --uri="mongodb://user:password@localhost:27017/test?authSource=admin" --out=dump --numParallelCollections=10 -v --gzip
mongorestore --uri="mongodb://user:password@localhost:27017/test?authSource=admin" dump/test --numParallelCollections=10 -v --gzip
```

### Elasticsearch

https://github.com/elasticsearch-dump/elasticsearch-dump

- dump

```
elasticdump --input=https://localhost:9200 --output=$ --limit 2000 | gzip > dump.gz
```

