Why Use Dataprep?#
Anyone who has done data analysis knows how laborious data reporting can be:
- Writing a lot of code to analyze data with pandas
- Too many configuration options in matplotlib to remember
- Writing several functions for data quality analysis
- The format of analysis reports is not professional enough
Dataprep is here to solve these pain points! It allows you to:
- Generate professional data reports with a single line of code
- Automatically perform data quality analysis
- Smartly generate data visualizations
- Provide detailed statistical information
- Get started quickly
Installation#
First, install this gem:
pip install dataprep
The Simplest Example#
from dataprep.eda import create_report
import pandas as pd
# Create sample data
data = {
'Name': ['Zhang San', 'Li Si', 'Wang Wu', 'Zhao Liu', 'Qian Qi'],
'Age': [25, 30, 22, 35, 28],
'Salary': [8000, 12000, 9000, 15000, 10000],
'Department': ['Technical', 'Sales', 'Technical', 'Sales', 'Technical']
}
df = pd.DataFrame(data)
# Generate report and save to file
report = create_report(df, title='Employee Data Analysis Report')
report.save('employee_report.html')
print("The report has been generated. Please open the employee_report.html file in your browser to view it.")
It's that simple! Run this code, and you will get a professional report containing the following content:
-
Data overview and basic statistics
-
Variable distribution analysis
-
Missing value analysis
-
Outlier detection
-
Correlation analysis