Creating Data Frames with R

Creating Data Frames

Basic Creation with data.frame()

The primary function for creating a Data Frame in R is data.frame(). This function combines vectors or lists of equal length into a table-like structure where each vector or list becomes a column.

Example 

# Create vectors for each column
names <- c("Alice", "Bob", "Charlie")
ages <- c(25, 30, 35)
cities <- c("Paris", "London", "Berlin")
# Create a Data Frame
df <- data.frame(Name = names, Age = ages, City = cities)
# Print the Data Frame
print(df)
# Output:
#     Name Age   City
# 1   Alice  25  Paris
# 2     Bob  30 London
# 3 Charlie  35 Berlin

Creating Data Frames from Lists

You can also create Data Frames from lists where each element of the list is a vector representing a column.

Example 

# Create a list of vectors
data_list <- list(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 35),
  City = c("Paris", "London", "Berlin")
)
# Create a Data Frame from the list
df_from_list <- data.frame(data_list)
# Print the Data Frame
print(df_from_list)
# Output:
#      Name Age   City
# 1   Alice  25  Paris
# 2     Bob  30 London
# 3 Charlie  35 Berlin

Using read.csv() for Data Frame Creation

Data Frames can also be created by importing data from external files, such as CSV files, using the read.csv() function.

Example

Assume you have a CSV file named data.csv with the following content: 

# Name,Age,City
# Alice,25,Paris
# Bob,30,London
# Charlie,35,Berlin

You can read this CSV file into a Data Frame: 

# Read data from CSV file into a Data Frame
df_from_csv <- read.csv("data.csv")
# Print the Data Frame
print(df_from_csv)
# Output:
#       Name Age   City
# 1    Alice  25  Paris
# 2      Bob  30 London
# 3  Charlie  35 Berlin

Specifying Column Types

When creating Data Frames, you can specify column types if needed. This is especially useful when reading from files or when you need to ensure data types are correctly interpreted.

Example 

# Create Data Frame with specified column types
df_specified <- data.frame(
  Name = as.character(c("Alice", "Bob", "Charlie")),
  Age = as.integer(c(25, 30, 35)),
  City = as.factor(c("Paris", "London", "Berlin"))
)
# Print the Data Frame and check column types
print(df_specified)
str(df_specified)
# Output of str(df_specified) :
# 'data.frame':   3 obs. of  3 variables:
# $ Name: chr  "Alice" "Bob" "Charlie"
# $ Age : int  25 30 35
# $ City: Factor w/ 3 levels "Berlin","London",..: 3 2 1

Handling Row Names

You can specify row names when creating a Data Frame. This is useful for labeling rows with meaningful identifiers.

Example 

# Create Data Frame with row names
df_with_rownames <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 35),
  City = c("Paris", "London", "Berlin"),
  row.names = c("Row1", "Row2", "Row3")
)
# Print the Data Frame
print(df_with_rownames)
# Output:
#        Name  Age  City
# Row1  Alice  25  Paris
# Row2    Bob  30 London
# Row3 Charlie  35 Berlin

Creating Empty Data Frames

You might need to create an empty Data Frame to populate it later.

Example 

# Create an empty Data Frame with specified columns
empty_df <- data.frame(
  Name = character(),
  Age = numeric(),
  City = factor(),
  stringsAsFactors = FALSE  # Prevent automatic conversion to factors
)
# Print the empty Data Frame
print(empty_df)
# Output:
# [1 x 3] Data Frame with no rows

 Data Frame with Mixed Data Types 

# Create a Data Frame with mixed data types
mixed_df <- data.frame(
  Name = c("Alice", "Bob"),
  Age = c(25, 30),
  Employed = c(TRUE, FALSE),
  Height = c(5.5, 6.0),
  stringsAsFactors = FALSE
)
# Print the Data Frame
print(mixed_df)
# Output:
#    Name Age Employed Height
# 1 Alice  25     TRUE    5.5
# 2   Bob  30    FALSE    6.0

A Data Frame can hold columns with different data types, which makes it very versatile.

Factors in Data Frames

Factors are used to handle categorical data. By default, character columns are converted to factors unless specified otherwise.

Example 

# Create a Data Frame with factors
df_factors <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  City = c("Paris", "London", "Berlin"),
  stringsAsFactors = TRUE  # Convert characters to factors
)
# Print the Data Frame and check factor levels
print(df_factors)
str(df_factors)
# Output of str(df_factors) :
# 'data.frame':   3 obs. of  2 variables:
#  $ Name: Factor w/ 3 levels "Alice","Bob",..: 1 2 3
#  $ City: Factor w/ 3 levels "Berlin","London",..: 1 2 3

Data Frames with List Columns

Data Frames can have columns that are lists, enabling the storage of more complex data structures.

Example 

# Create a Data Frame with list columns
list_df <- data.frame(
  Name = c("Alice", "Bob"),
  Scores = I(list(c(90, 85), c(88, 92))),  # Use I() to preserve lists
  stringsAsFactors = FALSE
)
# Print the Data Frame
print(list_df)
# Output:
#    Name    Scores
# 1 Alice  90, 85
# 2   Bob  88, 92

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *