Getter Module
- class suntzu.getter.Getter[source]
Bases:
object- get_best_dtype(col: str) str[source]
Determines the most memory-efficient data type for a column based on its values.
The method inspects the column’s current data type and value range to infer a more optimal dtype: - Integers are downcast to the smallest possible integer type. - Floats are downcast to the smallest possible floating-point type. - Object columns with a low number of unique values are converted to category. - Other types are returned unchanged.
Args: col (str): Name of the column to analyze.
Returns: str: The name of the most suitable data type for the column.
Examples
>>> from suntzu import Getter >>> Getter.get_best_dtype(df, "age") 'int8' >>> Getter.get_best_dtype(df, "price") 'float32' >>> Getter.get_best_dtype(df, "status") 'category'
- get_best_float(col_max: float) str[source]
Determines the most memory-efficient floating-point type capable of representing a range of values.
- Parameters:
col_min (float) – The minimum value in the range.
col_max (float) – The maximum value in the range.
- Returns:
- The name of the smallest floating-point type that can accommodate all values
in the range. Possible returns are “float16”, “float32”, or “float64”.
- Return type:
str
Examples
>>> from suntzu import Getter >>> Getter.get_best_float(0.1, 100.0) 'float16' >>> Getter.get_best_float(-1e5, 1e5) 'float32' >>> Getter.get_best_float(-1e40, 1e40) 'float64'
- get_best_int(col_max: int) str[source]
Determines the smallest integer type capable of representing a range of values.
- Parameters:
col_min (int) – The minimum value in the range.
col_max (int) – The maximum value in the range.
- Returns:
- The name of the smallest integer type that can accommodate all values
in the range. Possible returns are “int8”, “int16”, “int32”, or “int64”.
- Return type:
str
Examples
>>> from suntzu import Getter >>> Getter.get_best_int(-50, 100) 'int8' >>> Getter.get_best_int(-200, 30000) 'int16' >>> Getter.get_best_int(-50000, 100000) 'int32' >>> Getter.get_best_int(-5000000000, 5000000000) 'int64'
- get_max_value(col: str) int | str[source]
Returns the maximum value of a DataFrame column, handling different data types appropriately.
- Parameters:
col (str) – The column of the DataFrame to inspect.
- Returns:
For numeric columns, returns the maximum value.
For categorical or boolean columns, returns the most frequent value (mode).
- Return type:
int | str
- Raises:
MixedDtypeError – If the column contains mixed types or null values.
Examples
>>> from suntzu import Getter >>> import pandas as pd >>> df = pd.DataFrame({'a': [1, 3, 2], 'b': [True, False, True], 'c': ['x', 'y', 'x']}) >>> Getter.get_max_value(df, 'a') 3 >>> Getter.get_max_value(df, 'b') True >>> Getter.get_max_value(df, 'c') 'x'
- get_memory_insights(col: str, total_usage: int) list[source]
- get_memory_usage(col, unit) float[source]
Calculates the memory usage of a specific column in the DataFrame.
- Parameters:
col (str) – Name of the column to measure.
unit (str) – Unit for memory measurement. Options are: - “b” for bytes - “kb” for kilobytes - “mb” for megabytes
- Returns:
Memory usage of the specified column, rounded to 2 decimal places.
- Return type:
float
Examples
>>> df.get_memory_usage("age", "kb") 12.5 >>> df.get_memory_usage("price", "mb") 0.01
- get_min_value(col: str) int | str[source]
Returns the minimum value of a DataFrame column, handling different data types appropriately.
- Parameters:
col (str) – The column of the DataFrame to inspect.
- Returns:
For numeric columns, returns the minimum value.
For categorical or boolean columns, returns the least frequent value.
- Return type:
int | str
- Raises:
MixedDtypeError – If the column contains mixed types or null values.
Examples
>>> from suntzu import Getter >>> import pandas as pd >>> df = pd.DataFrame({'a': [1, 3, 2], 'b': [True, False, True], 'c': ['x', 'y', 'x']}) >>> Getter.get_min_value(df, 'a') 1 >>> Getter.get_min_value(df, 'b') False >>> Getter.get_min_value(df, 'c') 'y'
- get_total_memory_usage(unit) float[source]
Calculates the total memory usage of the DataFrame in the specified unit.
- Parameters:
unit (str) – Unit for memory measurement. Options are: - “b” for bytes - “kb” for kilobytes - “mb” for megabytes
- Returns:
Total memory usage of the DataFrame, rounded to 2 decimal places.
- Return type:
float
Examples
>>> df.get_total_memory_usage("kb") 125.5 >>> df.get_total_memory_usage("mb") 0.12