Metadata-Version: 2.1
Name: scd2
Version: 1.0.0
Summary: slowly changing dimension type 2 with pandas or parquet
Home-page: https://gitlab.com/liranc/pandas_scd
Keywords: scd,slowly changing dimension,type 2,pandas,parquet
Requires-Python: >=3.8
Description-Content-Type: text/markdown


# pandas_scd

  
executing slowly changing dimension type 2 on pandas dataframes or parquet files

  

**pandas_scd arguments:**

 - **src:** pandas dataframe with the source of the SCD
 - **tgt:** pandas dataframe with the target of the SCD (target can be
   empty)
   
 - **cols_to_track:** list of columns to track changes (default is all   
   columns from the source table)
 - **tz:** pytz time zone to use on start_ts and end_ts, default is None   
   (will use local time)


#####  the return dataframe contain the entire target table with the new changes, ready for insert overwrite of the current target table





**parquet_scd arguments:**

 - **src:** path to the source of the SCD
 - **tgt:** path to the target of the SCD (target can be empty)
- **cols_to_track:** list of columns to track changes (default is all columns from the source table)
- **tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)
##### there is no return value, the tgt path that was provided will be overwritten 
  
  

## Installation

    pip install scd2

  

## Getting started

*for working with pandas:*  

	from scd2 import SCD2
	import pandas as df	  

	tgt = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ["Clippers"], "start_ts": [datetime(2012, 1, 14, 3, 21, 34)], "end_ts": [None], "is_active": [True]}) 

	src = pd.DataFrame.from_dict({'first_name': ["Chris"], 'last_name': ['Paul'], 'team': ['Suns']})

	final_df = SCD2().pandas_scd2(src, tgt)

**pandas_scd2 will return a dataframe with the entire new targer**
  
  
  

tgt:

| first_name | last_name | team | start_ts | end_ts | is_active |

|------------|-----------|----------|---------------------|--------|-----------|

| Chris | Paul | Clippers | 2012-01-14 03:21:34 | | True |

  
  
  

src:

  

| first_name | last_name | team |

|------------|-----------|----------|

| Chris | Paul | Clippers |

  
  
  
  

final_df:

  

| first_name | last_name | team | start_ts | end_ts | is_active |

|------------|-----------|----------|---------------------|---------------------|-----------|

| Chris | Paul | Clippers | 2012-01-14 03:21:34 | 2018-01-01 00:00:00 | False |

| Chris | Paul | Suns | 2018-01-01 00:00:00 | | True |

  
  
  

*for working with parquet:*

	src_parquet_path = '~/source.parquet'

	tgt_parquet_path = '~/target.parquet'

	SCD2().parquet_scd2(src, tgt)
 

**parquet_scd2 will overide the current target (tgt_parquet_path)**


 

**src:** pandas dataframe with the source of the SCD

  

**tgt:** pandas dataframe with the target of the SCD (target can be empty)

  

**cols_to_track:** list of columns to track changes (default is all columns from the source table)

  

**tz:** pytz time zone to use on start_ts and end_ts, default is None (will use local time)
