Compare two data frames (or objects coercible to data frames) and produce a dataCompareR object containing
details of the matching and mismatching elements of the data. See vignette("dataCompareR")
for more details.
rCompare( dfA, dfB, keys = NA, roundDigits = NA, mismatches = NA, trimChars = FALSE )
dfA | data frame. The first data object. dataCompareR will attempt to coerce all data objects to data frames. |
---|---|
dfB | data frame. The second data object. dataCompareR will attempt to coerce all data objects to data frames. |
keys | String. Name of identifier column(s) used to compare dfA and dfB. NA if no identifier (row order will be used instead), a character for a single column name, or a vector of column names to match of multiple columns |
roundDigits | Integer. If NA, numerics are not rounded before comparison. If specified, numerics are rounded to the specified number of decimal places using round. |
mismatches | Integer. The max number of mismatches to assess, after which dataCompareR will stop (without producing an dataCompareR object). Designed to improve performance for large data sets. |
trimChars | Boolean. If true, strings and factors have whitespace trimmed before comparison. |
An dataCompareR object. An S3 object containing details of the comparison between the two data objects. Can be used with summary, print, saveReport and generateMismatchData
Other dataCompareR.functions:
generateMismatchData()
,
print.dataCompareRobject()
,
saveReport()
,
summary.dataCompareRobject()
iris2 <- iris iris2 <- iris2[1:130,] iris2[1,1] <- 5.2 iris2[2,1] <- 5.2 rCompare(iris,iris2,key=NA)#>#> All columns were compared, 20 row(s) were dropped from comparison #> There are 1 mismatched variables: #> First and last 5 observations for the 1 mismatched variables #> rowNo valueA valueB variable typeA typeB diffAB #> 1 1 5.1 5.2 SEPAL.LENGTH double double -0.1 #> 2 2 4.9 5.2 SEPAL.LENGTH double double -0.3compDetails <- rCompare(iris,iris2,key=NA, trimChars = TRUE)#>#> All columns were compared, 20 row(s) were dropped from comparison #> There are 1 mismatched variables: #> First and last 5 observations for the 1 mismatched variables #> rowNo valueA valueB variable typeA typeB diffAB #> 1 1 5.1 5.2 SEPAL.LENGTH double double -0.1 #> 2 2 4.9 5.2 SEPAL.LENGTH double double -0.3#>#> #> Data Comparison #> =============== #> #> Date comparison run: 2021-09-05 16:53:48 #> Comparison run on R version 4.1.1 (2021-08-10) #> With dataCompareR version 0.1.3 #> #> #> Meta Summary #> ============ #> #> #> |Dataset Name |Number of Rows |Number of Columns | #> |:------------|:--------------|:-----------------| #> |iris |150 |5 | #> |iris2 |130 |5 | #> #> #> Variable Summary #> ================ #> #> Number of columns in common: 5 #> Number of columns only in iris: 0 #> Number of columns only in iris2: 0 #> Number of columns with a type mismatch: 0 #> No match key used, comparison is by row #> #> #> #> Row Summary #> =========== #> #> Total number of rows read from iris: 150 #> Total number of rows read from iris2: 130 #> Number of rows in common: 130 #> Number of rows dropped from iris: 20 #> Number of rows dropped from iris2: 0 #> #> #> Data Values Comparison Summary #> ============================== #> #> Number of columns compared with ALL rows equal: 4 #> Number of columns compared with SOME rows unequal: 1 #> Number of columns with missing value differences: 0 #> #> Columns with all rows equal : PETAL.LENGTH, PETAL.WIDTH, SEPAL.WIDTH, SPECIES #> #> Summary of columns with some rows unequal: #> #> #> #> |Column |Type (in iris) |Type (in iris2) | # differences|Max difference | # NAs| #> |:------------|:--------------|:---------------|-------------:|:--------------|-----:| #> |SEPAL.LENGTH |double |double | 2|0.3 | 0| #> #> #> #> Unequal column details #> ====================== #> #> #> #> #### Column - SEPAL.LENGTH #> #> #> #> | | SEPAL.LENGTH (iris)| SEPAL.LENGTH (iris2)|Type (iris) |Type (iris2) | Difference| #> |:--|-------------------:|--------------------:|:-----------|:------------|----------:| #> |1 | 5.1| 5.2|double |double | -0.1| #> |2 | 4.9| 5.2|double |double | -0.3| #> #>pressure2 <- pressure pressure2[2,2] <- pressure2[2,2] + 0.01 rCompare(pressure2,pressure2,key='temperature')#>#> Warning: `arrange_()` was deprecated in dplyr 0.7.0. #> Please use `arrange()` instead. #> See vignette('programming') for more help#> All columns were compared, all rows were compared #> All compared variables match #> Number of rows compared: 19 #> Number of columns compared: 2rCompare(pressure2,pressure2,key='temperature', mismatches = 10)#>#> All columns were compared, all rows were compared #> All compared variables match #> Number of rows compared: 19 #> Number of columns compared: 2