This tutorial will discuss how to merge data in R data frames. DataFrame in R is a data structure that stores data in rows and columns with multiple data type elements. We can create a dataframe in R by using data.frame() function.
Also Read: Matrices in R | How to Create, Update, and Modify R Matrices
How to merge data in R
We can use data.frame() to create a dataframe.
Syntax:
data.frame(data)
Where data can be a vector with items.
Example 1: We will create a data frame with 4 rows and 3 columns in this example.
#create dataframe with plant details my_dataframe1=data.frame(plant_id=c(1,2,3,4), plant_name=c("rose","lotus","jasmine","rose"), plant_color=c("pink","white","white","red")) #display print(my_dataframe1)
Output:
plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red
Also Read: 10 Best R Programming Courses On Udemy
Merging data
Merging is also known as joining. Based on columns, we can merge/join the data present in the data frame.
Syntax:
merge(x=dataframe_input1,y=dataframe_input2)
Where,
- dataframe_input1 is the first input dataframe. Also known as the left dataframe.
- dataframe_input2 is the second input dataframe. Also known as the right dataframe.
There are 4 types of merges. Let’s discuss them one by one in detail.
Also Read: How To Learn Data Science [Beginner’s Guide]
1. Inner Join
This will join both the dataframes where matched column elements are present in both the dataframes, like the elements present in the columns. It will not join the unmatched columns.
Syntax:
merge(x=dataframe_input1,y=dataframe_input2)
where,
- dataframe_input1 is the first input dataframe. Also known as left dataframe
- dataframe_input2, is the second input dataframe. Also known as the right dataframe.
Example: We will create 2 dataframes with 4 rows and 3 columns each in this example.
first dataframe has columns – [plant_id,plant_name,plant_color] and second dataframe has columns – [plant_id,plant_name,plant_area] and perform inner join.
#create first dataframe with plant details my_dataframe1=data.frame(plant_id=c(1,2,3,4), plant_name=c("rose","lotus","jasmine","rose"), plant_color=c("pink","white","white","red")) #display print(my_dataframe1) #create second dataframe with plant details my_dataframe2=data.frame(plant_id=c(1,2,8,5), plant_name=c("rose","lotus","marigold","datura"), plant_area=c("TS","AP","AP","Delhi")) #display print(my_dataframe2) #perform inner join print(merge(x=my_dataframe1,y=my_dataframe2))
Output:
plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red plant_id plant_name plant_area 1 1 rose TS 2 2 lotus AP 3 8 marigold AP 4 5 datura Delhi plant_id plant_name plant_color plant_area 1 1 rose pink TS 2 2 lotus white AP
In the above output, we have seen that only two rows match in both the dataframes – plant_id 1 and 2. hence they are merged.
Also Read: 15 Best Python Libraries for Data Science and Analysis
2. Outer Join
This will join both the dataframes where column elements are either matched/unmatched.
Suppose the columns are unmatched concerning the dataframe. It will replace the NA value in the other dataframe. In this way, it will return both the joined dataframes.
Syntax:
merge(x=dataframe_input1,y=dataframe_input2,all=TRUE)
where,
- dataframe_input1 is the first input dataframe. Also known as left dataframe
- dataframe_input2, is the second input dataframe. Also known as the right dataframe.
- All parameter is used to join all dataframes, and it is set to TRUE.
Example: We will create 2 dataframes with 4 rows and 3 columns each in this example.
first dataframe has columns – [plant_id,plant_name,plant_color] and second dataframe has columns – [plant_id,plant_name,plant_area] and perform outer join.
#create first dataframe with plant details my_dataframe1=data.frame(plant_id=c(1,2,3,4), plant_name=c("rose","lotus","jasmine","rose"), plant_color=c("pink","white","white","red")) #display print(my_dataframe1) #create second dataframe with plant details my_dataframe2=data.frame(plant_id=c(1,2,8,5), plant_name=c("rose","lotus","marigold","datura"), plant_area=c("TS","AP","AP","Delhi")) #display print(my_dataframe2) #perform outer join print(merge(x=my_dataframe1,y=my_dataframe2,all=TRUE))
Output:
plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red plant_id plant_name plant_area 1 1 rose TS 2 2 lotus AP 3 8 marigold AP 4 5 datura Delhi plant_id plant_name plant_color plant_area 1 1 rose pink TS 2 2 lotus white AP 3 3 jasmine white <NA> 4 4 rose red <NA> 5 5 datura <NA> Delhi 6 8 marigold <NA> AP
In the above output, we have seen that all the rows from both the dataframes are merged.
Also Read: 10 Best Data Science Coursera Courses For Beginners
3. Left Join
It will join both the dataframe, where it will return all rows from the left/first dataframe and matched columns from the second(right) dataframe concerning the first dataframe.
If the values in the second dataframe are not matched concerning the first dataframe, then NA is filled in the second dataframe.
Syntax:
merge(x=dataframe_input1,y=dataframe_input2,all.x=TRUE)
where,
- dataframe_input1 is the first input dataframe. Also known as left dataframe
- dataframe_input2, is the second input dataframe. Also known as the right dataframe.
- all.x parameter represents the left dataframe, which is set to TRUE.
Example: We will create 2 dataframes with 4 rows and 3 columns each in this example.
first dataframe has columns – [plant_id,plant_name,plant_color] and second dataframe has columns – [plant_id,plant_name,plant_area] and perform left join.
#create first dataframe with plant details my_dataframe1=data.frame(plant_id=c(1,2,3,4), plant_name=c("rose","lotus","jasmine","rose"), plant_color=c("pink","white","white","red")) #display print(my_dataframe1) #create second dataframe with plant details my_dataframe2=data.frame(plant_id=c(1,2,8,5), plant_name=c("rose","lotus","marigold","datura"), plant_area=c("TS","AP","AP","Delhi")) #display print(my_dataframe2) #perform left join print(merge(x=my_dataframe1,y=my_dataframe2,all.x=TRUE))
Output:
plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red plant_id plant_name plant_area 1 1 rose TS 2 2 lotus AP 3 8 marigold AP 4 5 datura Delhi plant_id plant_name plant_color plant_area 1 1 rose pink TS 2 2 lotus white AP 3 3 jasmine white <NA> 4 4 rose red <NA>
Also Read: 11 Best Free Android Apps To Learn Data Science
4. Right Join
It will join both the dataframe, where it will return all rows from the right/second dataframe and matched columns from the first(left) dataframe concerning the second dataframe.
If the values in the first dataframe are not matched for the second dataframe, then NA is filled in the first dataframe.
Syntax:
merge(x=dataframe_input1,y=dataframe_input2,all.y=TRUE)
where,
- dataframe_input1 is the first input dataframe. Also known as left dataframe
- dataframe_input2, is the second input dataframe. Also known as the right dataframe.
- all.y parameter represents the right dataframe and is set to TRUE.
Example: We will create 2 dataframes with 4 rows and 3 columns each in this example.
first dataframe has columns – [plant_id,plant_name,plant_color] and second dataframe has columns – [plant_id,plant_name,plant_area] and perform right join.
#create first dataframe with plant details my_dataframe1=data.frame(plant_id=c(1,2,3,4), plant_name=c("rose","lotus","jasmine","rose"), plant_color=c("pink","white","white","red")) #display print(my_dataframe1) #create second dataframe with plant details my_dataframe2=data.frame(plant_id=c(1,2,8,5), plant_name=c("rose","lotus","marigold","datura"), plant_area=c("TS","AP","AP","Delhi")) #display print(my_dataframe2) #perform right join print(merge(x=my_dataframe1,y=my_dataframe2,all.y=TRUE))
Output:
plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red plant_id plant_name plant_area 1 1 rose TS 2 2 lotus AP 3 8 marigold AP 4 5 datura Delhi plant_id plant_name plant_color plant_area 1 1 rose pink TS 2 2 lotus white AP 3 5 datura <NA> Delhi 4 8 marigold <NA> AP
Also Read: What Is a Business Intelligence Analyst?
Merge row-wise
We can join rows by appending the second dataframe to the first dataframe using the rbind() function.
Syntax:
rbind(my_dataframe1,my_dataframe2)
where,
my_dataframe1 is the first dataframe and my_dataframe2 is the second dataframe.
Example:
In this example, we will join two dataframes using rbind().
#create first dataframe with plant details my_dataframe1=data.frame(plant_id=c(1,2,3,4), plant_name=c("rose","lotus","jasmine","rose"), plant_color=c("pink","white","white","red")) #display print(my_dataframe1) #create second dataframe with plant details my_dataframe2=data.frame(plant_id=c(5,6,7,8), plant_name=c("rose","lotus","marigold","datura"), plant_color=c("pink","white","white","red")) #display print(my_dataframe2) #join rows print(rbind(my_dataframe1,my_dataframe2))
Output:
plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red plant_id plant_name plant_color 1 5 rose pink 2 6 lotus white 3 7 marigold white 4 8 datura red plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red 5 5 rose pink 6 6 lotus white 7 7 marigold white 8 8 datura red
Also Read: 10 Best Data Science Courses On Udemy [2022]
Merge columns-wise.
We can join columns by appending the second dataframe to the first dataframe using the cbind() function.
Syntax:
cbind(my_dataframe1,my_dataframe2)
where,
my_dataframe1 is the first dataframe and my_dataframe2 is the second dataframe.
Example:
In this example, we will join two dataframes using cbind().
#create first dataframe with plant details my_dataframe1=data.frame(plant_id=c(1,2,3,4), plant_name=c("rose","lotus","jasmine","rose"), plant_color=c("pink","white","white","red")) #display print(my_dataframe1) #create second dataframe with plant details my_dataframe2=data.frame(plant_id=c(5,6,7,8), plant_name=c("rose","lotus","marigold","datura"), plant_color=c("pink","white","white","red")) #display print(my_dataframe2) #join by columns print(cbind(my_dataframe1,my_dataframe2))
Output:
plant_id plant_name plant_color 1 1 rose pink 2 2 lotus white 3 3 jasmine white 4 4 rose red plant_id plant_name plant_color 1 5 rose pink 2 6 lotus white 3 7 marigold white 4 8 datura red plant_id plant_name plant_color plant_id plant_name plant_color 1 1 rose pink 5 rose pink 2 2 lotus white 6 lotus white 3 3 jasmine white 7 marigold white 4 4 rose red 8 datura red
Also Read: 10 Best Books On Data Science For Beginners [2022]
Summary
From this tutorial, we discussed how to merge dataframes by row and column. We have also seen different types of joins to merge two dataframes in R.
Hello Friends! I am Himanshu, a hobbyist programmer, tech enthusiast, and digital content creator.
With CodeItBro, my mission is to promote coding and help people from non-tech backgrounds to learn this modern-age skill!