This function performs a series of data quality checks on a given dataframe, including checking the data structure, missing values, data accuracy, negative values, outliers, sample size, duplicate rows, and duplicate columns.
check_data_quality(df)
A dataframe.
A message indicating the results of each data quality check.
df <- data.frame(w = c(7, 8, 180, 7), x = c("a", "b", "c", "a"),
y = c(4, NA, -6, 4), z = c(7, 8, 180, 7))
# Check the data quality of the example dataframe
check_data_quality(df)
#> Number of rows: 4
#>
#> Number of columns: 4
#>
#> Column names: w, x, y, z
#>
#> Column data types: numeric, character, numeric, numeric
#>
#> Number of missing values: 1
#>
#> Missing values found in the following columns:
#> y: 1
#>
#> Data frame contains negative values.
#>
#> Extreme values found in the numerical columns of the data.
#>
#> Missing values detected in the data frame.
#>
#> Duplicate rows found:
#> Row 4 is a duplicate of row 1
#>
#> Duplicate columns found:
#> Column 'z' is a duplicate of column 'w'