Compare two data frames (or objects coercible to data frames) and produce a dataCompareR object containing details of the matching and mismatching elements of the data. See vignette("dataCompareR") for more details.

rCompare(
  dfA,
  dfB,
  keys = NA,
  roundDigits = NA,
  mismatches = NA,
  trimChars = FALSE
)

Arguments

dfA

data frame. The first data object. dataCompareR will attempt to coerce all data objects to data frames.

dfB

data frame. The second data object. dataCompareR will attempt to coerce all data objects to data frames.

keys

String. Name of identifier column(s) used to compare dfA and dfB. NA if no identifier (row order will be used instead), a character for a single column name, or a vector of column names to match of multiple columns

roundDigits

Integer. If NA, numerics are not rounded before comparison. If specified, numerics are rounded to the specified number of decimal places using round.

mismatches

Integer. The max number of mismatches to assess, after which dataCompareR will stop (without producing an dataCompareR object). Designed to improve performance for large data sets.

trimChars

Boolean. If true, strings and factors have whitespace trimmed before comparison.

Value

An dataCompareR object. An S3 object containing details of the comparison between the two data objects. Can be used with summary, print, saveReport and generateMismatchData

See also

Examples

iris2 <- iris iris2 <- iris2[1:130,] iris2[1,1] <- 5.2 iris2[2,1] <- 5.2 rCompare(iris,iris2,key=NA)
#> Running rCompare...
#> All columns were compared, 20 row(s) were dropped from comparison #> There are 1 mismatched variables: #> First and last 5 observations for the 1 mismatched variables #> rowNo valueA valueB variable typeA typeB diffAB #> 1 1 5.1 5.2 SEPAL.LENGTH double double -0.1 #> 2 2 4.9 5.2 SEPAL.LENGTH double double -0.3
compDetails <- rCompare(iris,iris2,key=NA, trimChars = TRUE)
#> Running rCompare...
print(compDetails)
#> All columns were compared, 20 row(s) were dropped from comparison #> There are 1 mismatched variables: #> First and last 5 observations for the 1 mismatched variables #> rowNo valueA valueB variable typeA typeB diffAB #> 1 1 5.1 5.2 SEPAL.LENGTH double double -0.1 #> 2 2 4.9 5.2 SEPAL.LENGTH double double -0.3
summary(compDetails)
#> dataCompareR is generating the summary...
#> #> Data Comparison #> =============== #> #> Date comparison run: 2021-09-05 16:53:48 #> Comparison run on R version 4.1.1 (2021-08-10) #> With dataCompareR version 0.1.3 #> #> #> Meta Summary #> ============ #> #> #> |Dataset Name |Number of Rows |Number of Columns | #> |:------------|:--------------|:-----------------| #> |iris |150 |5 | #> |iris2 |130 |5 | #> #> #> Variable Summary #> ================ #> #> Number of columns in common: 5 #> Number of columns only in iris: 0 #> Number of columns only in iris2: 0 #> Number of columns with a type mismatch: 0 #> No match key used, comparison is by row #> #> #> #> Row Summary #> =========== #> #> Total number of rows read from iris: 150 #> Total number of rows read from iris2: 130 #> Number of rows in common: 130 #> Number of rows dropped from iris: 20 #> Number of rows dropped from iris2: 0 #> #> #> Data Values Comparison Summary #> ============================== #> #> Number of columns compared with ALL rows equal: 4 #> Number of columns compared with SOME rows unequal: 1 #> Number of columns with missing value differences: 0 #> #> Columns with all rows equal : PETAL.LENGTH, PETAL.WIDTH, SEPAL.WIDTH, SPECIES #> #> Summary of columns with some rows unequal: #> #> #> #> |Column |Type (in iris) |Type (in iris2) | # differences|Max difference | # NAs| #> |:------------|:--------------|:---------------|-------------:|:--------------|-----:| #> |SEPAL.LENGTH |double |double | 2|0.3 | 0| #> #> #> #> Unequal column details #> ====================== #> #> #> #> #### Column - SEPAL.LENGTH #> #> #> #> | | SEPAL.LENGTH (iris)| SEPAL.LENGTH (iris2)|Type (iris) |Type (iris2) | Difference| #> |:--|-------------------:|--------------------:|:-----------|:------------|----------:| #> |1 | 5.1| 5.2|double |double | -0.1| #> |2 | 4.9| 5.2|double |double | -0.3| #> #>
pressure2 <- pressure pressure2[2,2] <- pressure2[2,2] + 0.01 rCompare(pressure2,pressure2,key='temperature')
#> Running rCompare...
#> Warning: `arrange_()` was deprecated in dplyr 0.7.0. #> Please use `arrange()` instead. #> See vignette('programming') for more help
#> All columns were compared, all rows were compared #> All compared variables match #> Number of rows compared: 19 #> Number of columns compared: 2
rCompare(pressure2,pressure2,key='temperature', mismatches = 10)
#> Running rCompare...
#> All columns were compared, all rows were compared #> All compared variables match #> Number of rows compared: 19 #> Number of columns compared: 2