• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

BuildACloud

All Programming Languages

  • R
  • Typescript
  • JavaScript
  • Python

How To Use The cut() Function In R

December 21, 2022 by cloudmin Leave a Comment

The R language provides the cut() function, which allows you to cut a numeric vector into a factor, which is very useful in categorizing data. This tutorial will share how to use the cut() function in R.

What is the cut() function in R?

The cut() function converts a numeric vector to a factor by dividing the range of the specified numeric vector into intervals and codes the values in that range.

Syntax:

cut(numeric vector, breaks, labels, include.lowest, right, dig.lab, ordered_result, …)

Parameters:

  • numeric vector: a numeric vector to cut.
  • breaks: number of intervals into which a numeric vector is to be cut.
  • labels: labels for the levels of the results. The default is NULL.
  • include.lowest: include the lowest ‘break’ value. The default is FALSE.
  • right: the intervals be closed on the right. The default is TRUE.
  • dig.lab: the number of digits used in formatting the break numbers. The default is 3.
  • ordered_result: result be an ordered factor. The default is FALSE.

How to use the cut() function in R?

We create a vector x containing the integers from 0 to 5. Here are some examples of how to use the cut() function to categorize data on the vector x.

In the first code, we set the ‘breaks’ parameter to 2 (an integer), so the cut() function cuts the vector x into 2 equal intervals.

Code:

# Create an vector
x <- 1:5

# Cut the x vector into 2 equal intervals
cut(x, breaks = 2)

Output:

[1] (0.996,3] (0.996,3] (0.996,3] (3,5]     (3,5]    
Levels: (0.996,3] (3,5]

In the following code, we set the parameter ‘breaks’ to a numeric vector of 3 cut points, then the cut() function cuts the value of the x vector into 2 intervals. The first level corresponds to the leftmost interval.

Code:

# Create an vector
x <- 1:5

# Cut the vector x by more unique cut points
cut(x, breaks = c(1, 2, 5))

Output:

[1] <NA>  (1,2] (2,5] (2,5] (2,5]
Levels: (1,2] (2,5]

By default, the ‘right’ parameter has a TRUE value, meaning that intervals are opened on the left and closed on the right. If the ‘right’ parameter is set to FALSE, the intervals will be closed on the left and opened on the right.

Code:

# Create an vector
x <- 1:5

# The intervals should be closed on the left
cut(x, breaks = c(1, 2, 5), right = FALSE)

Output:

[1] [1,2) [2,5) [2,5) [2,5) <NA> 
Levels: [1,2) [2,5)

By default, labels are formatted as interval notation (a,b]. In this example, we set labels for levels of output. Note that the number of labels set must equal the number of intervals the cut() function returns.

Code:

# Create an vector
x <- 1:5

# Set labels for the levels
cut(x,
    breaks = c(1, 2, 5),
    labels = c("Group1", "Group2")
)

Output:

[1] <NA>   Group1 Group2 Group2 Group2
Levels: Group1 Group2

If we set the ‘include.lowest’ parameter to TRUE, the cut() function will include the value equal to the lowest ‘breaks’ value (or the highest ‘breaks’ value if ‘right = TRUE’).

Code:

# Create an vector
x <- 1:5

# Closed to the left of the lowest 'breaks' value
cut(
    x,
    breaks = c(1, 2, 5),
    labels = c("Group1", "Group2"),
    include.lowest = TRUE
)

Output:

[1] Group1 Group1 Group2 Group2 Group2
Levels: Group1 Group2

Complete code

We use the cut() function to classify some students’ test scores.

Example:

Name <-c(
    "Frank",
    "Charles",
    "Johnny",
    "Orlando",
    "Bruce",
    "Lynda",
    "Alice",
    "Robin",
    "Charles",
    "Hanna"
)

Scores <- c(75, 40, 39, 5, 67, 90, 55, 78, 0, 86)

Grade <- cut(
    Scores,
    breaks = c(0, 39, 44, 49, 54, 59, 64, 69, 79, 100),
    labels = c("F", "E", "D", "C", "C+", "B", "B+", "A", "A+"),
    include.lowest = TRUE
)

gradePoint <- data.frame(Name, Scores, Grade)
gradePoint

Output:

      Name Scores Grade
1    Frank     75     A
2  Charles     40     E
3   Johnny     39     F
4  Orlando      5     F
5    Bruce     67    B+
6    Lynda     90    A+
7    Alice     55    C+
8    Robin     78     A
9  Charles      0     F
10   Hanna     86    A+

We ranged scores from 0 to 100 with labels from “F” to “A+” respectively. The test score can be 0, so we set ‘include.lowest’ to TRUE, so the result will also include the value 0.

Summary

We have already shared how to use the cut() function in R. If you want the output to include the lowest ‘breaks’ value (or the highest with ‘right = FALSE’), you must set ‘include.lowest’ to TRUE. Thank you for reading.

Filed Under: R

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Recent Posts

  • dist function in R – Calculate the distance of the matrix
  • How to count in a for loop in Python
  • How To Use The cut() Function In R
  • How To Resolve The TypeError: unhashable type: ‘dict’ in Python
  • The diff() function in R

Recent Comments

No comments to show.

Copyright © 2023 · Metro Pro on Genesis Framework · WordPress · Log in