Metadata-Version: 2.1
Name: csvquery
Version: 1.1.2
Summary: A versatile python package that allows you to execute MongoDB-style queries on CSV files and interact with them like SQL tables.
Home-page: https://github.com/hycsg/csvquery
Author: Houston Youth Computer Science Group
Author-email: houstoncsgroup@gmail.com
License: UNKNOWN
Description: # csvquery
        
        A versatile python package that allows you to execute MongoDB-style queries on CSV files and interact with them like SQL tables.
        
        ## Downloading
        
        ```
        pip install csvquery
        ```
        
        ## Package contents
        
        ### open_csv(str path[, str delimiter = ","])
        Produces a **Dataset** from a CSV file:
        ```python
        from csvquery import open_csv
        
        dataset = open_csv("path/to/file.csv")
        ```
        
        ### get_csv(str url[, str delimiter = ","])
        Produces a **Dataset** from a URL:
        ```python
        from csvquery import get_csv
        
        dataset = get_csv("http://example.com/api/data.csv")
        ```
        
        ### parse_csv(str string[, str delimiter = ","])
        Produces a **Dataset** from a string:
        ```python
        from csvquery import parse_csv
        
        string = "name,age\nJohn,34\nJane,33"
        dataset = parse_csv(string)
        ```
        
        
        
        
        ### Operators
        
        Stores all the valid query operator keywords as attributes. Using this class is optional as you can just use the keyword strings instead.
        
        - **equal** = "eq"
        - **not_equal** = "neq"
        - **less_than** = "lt"
        - **greater_than** = "gt"
        - **less_than_or_equal** = "lte"
        - **greater_than_or_equal** = "gte"
        - **inside** = "in"
        - **_not** = "not"
        - **_and** = "and"
        - **_or** = "or"
        
        
        
        
        ### Comparisons
        
        Stores some common comparison operators.
        
        #### integers
        A lambda function to compare integer values.
        ```python
        data.index("age", Comparisons.integers)
        ```
        
        #### floats
        A lambda function to compare floating-point values.
        ```python
        data.index("rate", Comparisons.floats)
        ```
        
        #### strings
        A lambda function to compare strings alphabetically.
        ```python
        data.index("name", Comparisons.strings)
        ```
        
        #### default
        An alias for the **floats** comparison.
        
        #### get_date_comparison(str format_string)
        Returns a function that compares dates based on the format string. See https://strftime.org/ for a list of all valid date codes.
        ```python
        data.index("date", Comparisons.get_date_comparison("%Y-%m-%d"))
        ```
        
        
        
        
        ## The Dataset object
        
        The Dataset object is similar to an SQL table. It can be obtained with the **open_csv**, **get_csv**, and **parse_csv** methods.
        
        ### data
        A two-dimensional list of the data.
        ```python
        for row in voter_dataset.data:
            print(row[0])
            ...
        ```
        
        ### fields
        A list of the dataset's fields, or column names.
        ```python
        for field in voter_dataset.fields:
            print(field)
            ...
        ```
        
        ### index(str field[, func comparison_operation = Comparisons.default])
        Sort the rows of data based on the values in a specified field. Sorting the data is optional, but doing so allows you to do binary searches which have a time complexity of just **O(log(n))**. The **comparison_operation** argument must be a function that returns **True** when the first argument is less than the second argument, and **False** if otherwise. Import **Comparisons** for some common comparison operations. By default, the **comparison_operation** is a floating-point comparison.
        ```python
        from csvquery import open_csv, Comparisons
        
        dataset = open_csv("people.csv")
        dataset.index("age", Comparisons.integers) # sorts people by ascending age
        ```
        You can also make your own comparison operator.
        ```python
        dataset.index("age", lambda a, b: a**2 < b**2)
        ```
        
        ### already_indexed(str field[, func comparison_operation = Comparisons.default])
        Specifies that the data is already sorted by a certain field, allowing binary searches without re-sorting.
        ```python
        from csvquery import open_csv, Comparisons
        
        dataset = open_csv("people.csv")
        dataset.already_indexed("name", Comparisons.strings)
        ```
        
        ### query(dict filter_object)
        Returns all rows that match the **filter_object** as another **Dataset**.
        ```python
        from csvquery import open_csv
        
        dataset = open_csv("people.csv")
        dataset.index("age")
        
        voter_dataset = dataset.query({
            "age": {          # this filter will run as a binary search since we indexed the data by age
                "gte": 18     # the query will only return people who's age is greater than or equal to 18
            },
            "citizenship" {   # this will run after the binary search to filter the narrowed-down data
                "eq": "USA"   # people will only pass this filter if their "citizenship" field is equal to "USA"
            }
        })
        ```
        The general structure of a **filter_object** is as follows:
        ```python
        {
            "field_1": {
                "operator_1": "value",
                "operator_2": "value",
                "operator_3": {
                    "nested_operator": "value"
                },
                ...
                "operator_n": "value_n"
            },
            "field_2": {
                ...
            },
            ...
            "field_3": {
                ...
            }
        }
        ```
        You can also use the **csvquery.Operators** class instead of operator strings:
        ```python
        from csvquery import Operators
        
        voters_named_john = voter_dataset.query({
            "name": {
                Operators.eq : "John"
            }
        })
        ```
        If you want to use a comparison operator like **gt** or **lte** on a column that was not indexed, you need to provide a comparison operator in the **filter_object** like so:
        ```python
        from csvquery import open_csv, Operations, Comparisons
        
        dataset = open_csv("people.csv")
        dataset.index("citizenship") # sorts people by citizenship
        
        voter_dataset = dataset.query({
            "citizenship": { # binary search
                "eq": "USA"
            },
            "age" {  # not a binary search
                "gte": "18"
                "comparison": Comparisons.integers
            }
        })
        ```
        
        ### query_one(dict filter_object)
        Returns the first row that matches the **filter_object** as a **Dataset**:
        ```python
        john_doe = people_dataset.query_one({"phone":"555-123-4567"})
        ```
        
        ### select(list fields)
        Returns the a new **Dataset** object with only the specified fields.
        ```python
        names_and_ages = people.select(["name", "age"])
        ```
        
        ### select_as(dict fields)
        Returns the a new **Dataset** object with only the specified fields, except the fields are renamed according to the **fields** dictionary.
        ```python
        names_and_ages = people.select_as({
            "first_and_last_name": "name",
            "years_of_oldness": "age"
        })
        ```
        
        ### select_unique(str field)
        Returns a new **Dataset** object with only the specified field, and removes any duplicate values so that each value is unique.
        ```python
        names = people.select_unique("name")
        ```
        
        ### add_field(str field[, func derivation = lambda r:""])
        Adds another field with the specified name. By default, the field will be filled with blank values.
        ```python
        people.add_field("status")
        ```
        You can optionally specify a function that takes the data in that row as a dictionary and outputs the new derived value per row.
        ```python
        people.add_field("full_name", lambda row: row["first_name"] + " " + row["last_name"]])
        ```
        
        ### remove_fields(list fields)
        Removes the specified fields from the **Dataset**.
        ```python
        people.remove_fields(["status", "full_name"])
        ```
        
        ### rename_fields(dict fields)
        Renames fields according to the **fields** dictionary argument.
        ```python
        people.rename_fields({
            "first_and_last_name": "name",
            "years_of_oldness": "age"
        })
        ```
        
        ### replace(list fields, func function)
        Replaces the values in the specified **fields** list argument using the **function** argument, which takes the current value as input and outputs the new value.
        ```python
        people.replace(["first name", "last name"], lambda v: v.lower()) # makes all "first name" and "last name" values lower case
        ```
        
        ### replace_derived(list fields, func derivation)
        Replaces the values in the specified **fields** list argument using the **function** argument, which takes the row as a dictionary as input and outputs the new value.
        ```python
        def birthday_to_age(row):
            bday = datetime.strptime(row["date_of_birth"], "%Y-%m-%d")
            today = datetime.today()
            return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
        
        people.replace_derived(["age"], birthday_to_age)
        ```
        
        ### join(Dataset other_dataset, list common_fields[, bool remove = True])
        Adds fields from **other_dataset** to this **Dataset** and matches the rows by referencing the **common_fields** list argument. The **common_fields** list must have the field from the current **Dataset** and the field from the other **Dataset** in that order. By default, this method will remove the common field after the operation, but you can prevent this by setting the **remove** argument to **False**.
        ```python
        locations = open_csv("locations.csv") # has an id field
        people = open_csv("people.csv") # has a location_id field that matches people to locations
        
        people.join(locations, ["location_id", "id"])
        ```
        
        ### to_dictionary()
        Returns a the data as a dictionary if the **Dataset** has only one row (as a result of a **query_one** operation, for example).
        ```python
        john_doe = people.query_one({"phone":"555-123-4567"}) # dataset is one row high
        print(john_doe.to_dictionary()["address"])
        ```
        
        ### to_list()
        Returns a the data as a list if the **Dataset** has only one column (as a result of a **select** operation, for example).
        ```python
        texans = people.query({"state":"TX"}).select("name") # dataset is one column wide
        texan_names = texans.to_list()
        ```
        
        ### count([list fields])
        If the **fields** argument is left blank, returns the number of rows in the **Dataset**.
        ```python
        number_of_people = people.count()
        ```
        If otherwise, returns the number of rows in which the all of the specified fields are not empty.
        ```python
        number_of_with_jobs = people.count(["job"]) # assuming the "job" field is left blank for unemployed people
        ```
        
        ### sum(str field)
        Returns a sum of all the values in that field.
        ```python
        total_net_worth = people.sum("net_worth")
        ```
        
        ### average(str field)
        Returns a average of all the values in that field.
        ```python
        average_net_worth = people.average("net_worth")
        ```
        
        ### print_table([list fields])
        Outputs your data to the console in a nice table.
        ```python
        voter_dataset.print_table()
        ```
        You can optionally specify which columns to print.
        ```python
        voter_dataset.print_table(["name", "age"])
        ```
        
        ### save_csv(str filepath[, str delimiter = ","[, fields = <all>]])
        Saves the **Dataset** to a file. If no fields are specified, all fields will be saved.
        ```python
        voter_dataset.save_csv("output.csv", ";", ["name", "age"])
        ```
        
        
        ## More examples
        
        ### SQL equivalent
        
        #### Classic SQL query
        
        ```sql
        SELECT name, age FROM people
        WHERE age >= 18 AND citizenship = "USA";
        ```
        
        #### Python MongoDB-style query
        
        ```python
        voters = people.query({
            "age": {
                "gte": 18
            },
            "citizenship": "USA"
        }).select(["name", "age"])
        ```
        
        ### Printing certain columns
        
        ```python
        dataset = open_csv("people.csv")
        dataset.print_table(dataset.fields[2:5])
        ```
        
        ### Rewriting a CSV file with fewer columns and a different delimiter
        
        ```python
        dataset = open_csv("people.csv")
        dataset.save_csv("people.csv", ";", dataset.fields[2:5])
        ```
        
        ### The "eq" operator is optional in the top level of the dictionary
        
        ```python
        dataset = csvquery.open_csv("people.csv")
        dataset.query({
            "name":"John"
        })
        ```
        
        ### Selecting one field
        
        ```python
        people.select("name") # doesn't need to be a list if it's just one
        ```
        
        ### Chaining
        
        ```python
        (
        open_csv("people.csv")
            .index("age")
            .query({"age":{"gte":18}, "citizenship":"USA"})
            .select(["name", "id"])
            .save_csv("voters.csv", ",")
            .print_table()
        )
        ```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.4
Description-Content-Type: text/markdown
