Creating Data Frames
Basic Creation with data.frame()
The primary function for creating a Data Frame in R is data.frame(). This function combines vectors or lists of equal length into a table-like structure where each vector or list becomes a column.
Example
# Create vectors for each column names <- c("Alice", "Bob", "Charlie") ages <- c(25, 30, 35) cities <- c("Paris", "London", "Berlin") # Create a Data Frame df <- data.frame(Name = names, Age = ages, City = cities) # Print the Data Frame print(df) # Output: # Name Age City # 1 Alice 25 Paris # 2 Bob 30 London # 3 Charlie 35 Berlin
Creating Data Frames from Lists
You can also create Data Frames from lists where each element of the list is a vector representing a column.
Example
# Create a list of vectors data_list <- list( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 35), City = c("Paris", "London", "Berlin") ) # Create a Data Frame from the list df_from_list <- data.frame(data_list) # Print the Data Frame print(df_from_list) # Output: # Name Age City # 1 Alice 25 Paris # 2 Bob 30 London # 3 Charlie 35 Berlin
Using read.csv() for Data Frame Creation
Data Frames can also be created by importing data from external files, such as CSV files, using the read.csv() function.
Example
Assume you have a CSV file named data.csv with the following content:
# Name,Age,City # Alice,25,Paris # Bob,30,London # Charlie,35,Berlin
You can read this CSV file into a Data Frame:
# Read data from CSV file into a Data Frame df_from_csv <- read.csv("data.csv") # Print the Data Frame print(df_from_csv) # Output: # Name Age City # 1 Alice 25 Paris # 2 Bob 30 London # 3 Charlie 35 Berlin
Specifying Column Types
When creating Data Frames, you can specify column types if needed. This is especially useful when reading from files or when you need to ensure data types are correctly interpreted.
Example
# Create Data Frame with specified column types df_specified <- data.frame( Name = as.character(c("Alice", "Bob", "Charlie")), Age = as.integer(c(25, 30, 35)), City = as.factor(c("Paris", "London", "Berlin")) ) # Print the Data Frame and check column types print(df_specified) str(df_specified) # Output of str(df_specified) : # 'data.frame': 3 obs. of 3 variables: # $ Name: chr "Alice" "Bob" "Charlie" # $ Age : int 25 30 35 # $ City: Factor w/ 3 levels "Berlin","London",..: 3 2 1
Handling Row Names
You can specify row names when creating a Data Frame. This is useful for labeling rows with meaningful identifiers.
Example
# Create Data Frame with row names df_with_rownames <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 35), City = c("Paris", "London", "Berlin"), row.names = c("Row1", "Row2", "Row3") ) # Print the Data Frame print(df_with_rownames) # Output: # Name Age City # Row1 Alice 25 Paris # Row2 Bob 30 London # Row3 Charlie 35 Berlin
Creating Empty Data Frames
You might need to create an empty Data Frame to populate it later.
Example
# Create an empty Data Frame with specified columns empty_df <- data.frame( Name = character(), Age = numeric(), City = factor(), stringsAsFactors = FALSE # Prevent automatic conversion to factors ) # Print the empty Data Frame print(empty_df) # Output: # [1 x 3] Data Frame with no rows
Data Frame with Mixed Data Types
# Create a Data Frame with mixed data types mixed_df <- data.frame( Name = c("Alice", "Bob"), Age = c(25, 30), Employed = c(TRUE, FALSE), Height = c(5.5, 6.0), stringsAsFactors = FALSE ) # Print the Data Frame print(mixed_df) # Output: # Name Age Employed Height # 1 Alice 25 TRUE 5.5 # 2 Bob 30 FALSE 6.0
A Data Frame can hold columns with different data types, which makes it very versatile.
Factors in Data Frames
Factors are used to handle categorical data. By default, character columns are converted to factors unless specified otherwise.
Example
# Create a Data Frame with factors df_factors <- data.frame( Name = c("Alice", "Bob", "Charlie"), City = c("Paris", "London", "Berlin"), stringsAsFactors = TRUE # Convert characters to factors ) # Print the Data Frame and check factor levels print(df_factors) str(df_factors) # Output of str(df_factors) : # 'data.frame': 3 obs. of 2 variables: # $ Name: Factor w/ 3 levels "Alice","Bob",..: 1 2 3 # $ City: Factor w/ 3 levels "Berlin","London",..: 1 2 3
Data Frames with List Columns
Data Frames can have columns that are lists, enabling the storage of more complex data structures.
Example
# Create a Data Frame with list columns list_df <- data.frame( Name = c("Alice", "Bob"), Scores = I(list(c(90, 85), c(88, 92))), # Use I() to preserve lists stringsAsFactors = FALSE ) # Print the Data Frame print(list_df) # Output: # Name Scores # 1 Alice 90, 85 # 2 Bob 88, 92