Metadata-Version: 2.1
Name: housing_predictor_devaraj_saravana
Version: 0.0.1
Summary: california's housing price prediction
Author-email: "devaraj.saravana" <devaraj.saravana@tigeranalytics.com>
License: Copyright (c) 2018 The Python Packaging Authority
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/pypa/sampleproject
Project-URL: Bug Tracker, https://github.com/pypa/sampleproject/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# Median housing value prediction

The housing data can be downloaded from https://raw.githubusercontent.com/ageron/handson-ml/master/. The script has codes to download the data. We have modelled the median house value on given housing data.

The following techniques have been used:

 - Linear regression
 - Decision Tree
 - Random Forest with RandomizedSearchCV
 - Random Forest with GridSearchCV

## Steps performed
 - We prepare and clean the data. We check and impute for missing values.
 - Features are generated and the variables are checked for correlation.
 - Multiple sampling techinuqies are evaluated. The data set is split into train and test.
 - All the above said modelling techniques are tried and evaluated.
 - Mean squared error, Root mean squaerd error, Mean absolute error metrics are used to evaluate the model
## To excute the script's
>There are three scripts need to run for evaluating the model

    $ python ingest_data.py -p raw

you can run this script with specifying where you want to place the downloaded data and also with default arguments

    $ python train.py -x housing_prepared.csv -y housing_labes.csv

you can run this script with specifying dependent and independent variables and also with no argument passed

    $ python score.py -m final_model.pkl -d test_set.csv

you can run this script with specifying which ML model want to use and with what dataset to score metrics

