Skip to content

Instantly share code, notes, and snippets.

@Dre1k23
Last active April 22, 2024 14:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Dre1k23/67b290fcebff9e065a2f4aa2dc65358f to your computer and use it in GitHub Desktop.
Save Dre1k23/67b290fcebff9e065a2f4aa2dc65358f to your computer and use it in GitHub Desktop.
mb some explain
#Convert all attributes to lowercase
(
df = df.map(lambda x : x.lower() if isinstance(x, str) else x)
)
#Assign the correct data format to the attributes that need it.
(
df['your column name'] = pd.to_datetime(df['your column name'])
dfq = df
)
#Let's reduce the attributes to one data type
(
column_factorize = df.select_dtypes(include = 'object')
df2 = column_factorize.apply(lambda x: pd.factorize(x)[0])
df = pd.concat([df2, df[['your column name', 'your column name']]], axis = 1)
df
)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
#we need to chek how many nulls we have
(
df.isnull().sum()
)
#We can also determine critical values from missing values
(
critical_nulls = 0.3
missing_ratios = dfq.isnull().mean()
critical_columns = missing_ratios[missing_ratios > critical_nulls]
if not critical_columns.empty:
print("Critical column:")
print(critical_columns)
else:
print("No critical columns.")
)
#To estimate the error with the permissible number of missing values, we can use the following
(
df.describe(include = "all")
)
@Dre1k23
Copy link
Author

Dre1k23 commented Apr 20, 2024

I hope this helps you to work with data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment