Package 'rcppmlpackexamples' reference manual

Title:	Example Use of 'mlpack' from C++ via R
Description:	A Minimal Example Package which demonstrates 'mlpack' use via C++ Code from R.
Authors:	Dirk Eddelbuettel [aut, cre], Authors of mlpack [aut], Constantinos Giachalis [ctb]
Maintainer:	Dirk Eddelbuettel <[email protected]>
License:	GPL (>= 2)
Version:	0.0.1.1
Built:	2026-07-12 05:07:24 UTC
Source:	https://github.com/eddelbuettel/rcppmlpack-examples

Example Use of 'mlpack' from C++ via R

Description

A Minimal Example Package which demonstrates 'mlpack' use via C++ Code from R.

Package Content

Index of help topics:

adaBoost                An AdaBoost classification
covertype_small         Covertype data subset used for classification
datasetExample          Simple example of loading categorical data via
                        'mlpack'
decisionTree            Run a decisionTree classification
hoeffdingTrees          Run a Random Forest classificatio
kMeans                  Run a k-means clustering analysis
linearRegression        Run a linear regression with optional ridge
                        regression
loanData                Loan data subset used for default prediction
loanDefaultPrediction   loanDefaultPrediction
logisticRegression      Run logistic regression
logisticRegressionData
                        Logistic regression example data set
randomForest            Run a Random Forest classificatio
rcppmlpackexamples-package
                        Example Use of 'mlpack' from C++ via R

Maintainer

Dirk Eddelbuettel <[email protected]>

Author(s)

Dirk Eddelbuettel [aut, cre], Authors of mlpack [aut], Constantinos Giachalis [ctb]

An AdaBoost classification

Description

Run AdaBoost using a simple Perceptron model as the weak learner

Usage

adaBoost(dataset, labels, iterations = 100L, tolerance = 2e-10,
  perceptronIter = 400L)
adaBoost(dataset, labels, iterations = 100L, tolerance = 2e-10,
  perceptronIter = 400L)

Arguments

dataset

A matrix of explanatory variables, i.e. “features”

labels

A vector of the dependent variable as integer values, i.e. “labels”

iterations

An integer value for the number of iterations

tolerance

A double with the desired tolerance

perceptronIter

An integer value for the number of a iterations for the weak learner

Value

A list object

Examples

data(iris)
X <- t(as.matrix(iris[,1:4]))
y <- as.integer(iris[,5]) - 1   # mlpack prefers {0, 1, 2}
adaBoost(X, y)
data(iris)
X <- t(as.matrix(iris[,1:4]))
y <- as.integer(iris[,5]) - 1   # mlpack prefers {0, 1, 2}
adaBoost(X, y)

Covertype data subset used for classification

Description

A subset of the UCI machine learning data set ‘covertype’ describing cloud coverage in seven different states of coverage. This smaller subset contains with 100,000 observations and 55 variables. The first 54 variables are explanatory (i.e. “features”), with the last providing the dependent variable (“labels”. The data is in the ‘wide’ 55 x 100,000 format used by mlpack. The dependent variable has been transformed to the range zero to six by subtracting one from the values found in the data file.

Details

The original source of the data is the US Forest Service, and the complete file is part of the UC Irvince machine learning data repository.

Source

https://www.mlpack.org/datasets/covertype-small.csv.gz

References

https://archive.ics.uci.edu/dataset/31/covertype

Simple example of loading categorical data via 'mlpack'

Description

Simple example of loading categorical data via 'mlpack'

Usage

datasetExample()
datasetExample()

Value

Nothing is returned, the function is invoked for its side effect.

Run a decisionTree classification

Description

Run decisionTree classifier

Usage

decisionTree(dataset, labels, pct = 0.3, min_leaf_size = 10L,
  minimum_gain_split = 1e-07, maximum_depth = 0L)
decisionTree(dataset, labels, pct = 0.3, min_leaf_size = 10L,
  minimum_gain_split = 1e-07, maximum_depth = 0L)

Arguments

dataset

A matrix of explanatory variables, i.e. “features”

labels

A vector of the dependent variable as integer values, i.e. “labels”

pct

A numeric value for the percentage of data to be retained for the test set

min_leaf_size

An integer value with the minimum number of elements per leaf

minimum_gain_split

A double with the gain needed to further split the tree

maximum_depth

An integer with the maximum tree depth, default zero means unlimited

Value

A list object

Examples

data(iris)
X <- t(as.matrix(iris[,1:4]))
y <- as.integer(iris[,5]) - 1   # mlpack prefers {0, 1, 2}
decisionTree(X, y)
data(iris)
X <- t(as.matrix(iris[,1:4]))
y <- as.integer(iris[,5]) - 1   # mlpack prefers {0, 1, 2}
decisionTree(X, y)

Run a Random Forest classificatio

Description

Run a Hoeffding Trees (Batch) Classifier

Usage

hoeffdingTrees(dataset, labels, pct = 0.3, nclasses = 7L)
hoeffdingTrees(dataset, labels, pct = 0.3, nclasses = 7L)

Arguments

dataset

A matrix of explanatory variables, i.e. “features”

labels

A vector of the dependent variable as integer values, i.e. “labels”

pct

A numeric value for the percentage of data to be retained for the test set

nclasses

An integer value for the number of a distinct values in labels

Details

This function performs a Hoeffding Trees classification.

Value

A list object

Examples

data(covertype_small)                           # see help(covertype_small)
res <- hoeffdingTrees(covertype_small[-55,],    # features (already transposed)
                      covertype_small[55,],     # labels now in [0, 6] range
                      0.3)                      # percentage used for testing
str(res)  # accuracy varies as method is randomized but no seed set here
data(covertype_small)                           # see help(covertype_small)
res <- hoeffdingTrees(covertype_small[-55,],    # features (already transposed)
                      covertype_small[55,],     # labels now in [0, 6] range
                      0.3)                      # percentage used for testing
str(res)  # accuracy varies as method is randomized but no seed set here

Run a k-means clustering analysis

Description

Run a k-means clustering analysis, returning a list of cluster assignments

Usage

kMeans(data, clusters)
kMeans(data, clusters)

Arguments

data

A matrix of data values

clusters

An integer specifying the number of clusters

Details

This function performs a k-means clustering analysis on the given data set.

Value

A list with cluster assignments

Examples

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
cl <- kMeans(x, 2)

data(trees, package="datasets")
cl2 <- kMeans(t(trees),3)
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
cl <- kMeans(x, 2)

data(trees, package="datasets")
cl2 <- kMeans(t(trees),3)

Run a linear regression with optional ridge regression

Description

Run a linear regression (with optional ridge regression)

Usage

linearRegression(matX, vecY, lambda = 0, intercept = TRUE)
linearRegression(matX, vecY, lambda = 0, intercept = TRUE)

Arguments

matX

A matrix of explanatory variables (‘predictors’) in standard R format (i.e. ‘tall and skinny’ to be transposed internally to MLPACK format (i.e. ‘short and wide’).

vecY

A vector of dependent variables (‘responses’)

lambda

An optional ridge parameter, defaults to zero

intercept

An optional boolean switch about an intercept, default is true.

Details

This function performs a linear regression, and serves as a simple test case for accessing an MLPACK function.

Value

A vector with fitted values

Examples

suppressMessages(library(utils))
data("trees", package="datasets")
X <- with(trees, cbind(log(Girth), log(Height)))
y <- with(trees, log(Volume))
lmfit <- lm(y ~ X)
# summary(fitted(lmfit))
mlfit <- linearRegression(X, y)
# summary(mlfit)
all.equal(unname(fitted(lmfit)),  as.vector(mlfit))
suppressMessages(library(utils))
data("trees", package="datasets")
X <- with(trees, cbind(log(Girth), log(Height)))
y <- with(trees, log(Volume))
lmfit <- lm(y ~ X)
# summary(fitted(lmfit))
mlfit <- linearRegression(X, y)
# summary(mlfit)
all.equal(unname(fitted(lmfit)),  as.vector(mlfit))

Loan data subset used for default prediction

Description

A four column data set containing a binary variable ‘Employed’ (with zero denoting unemployment and one employment), a numeric variable ‘Bank Balance’, a numeric variable ‘Annual Salary’ and a binary target variable ‘Defaulted?’ (with zero denoting loan repayment and one denoting default).

Details

The original source of the data is not documented by mlpack.

Source

https://datasets.mlpack.org/LoanDefault.csv

References

https://archive.ics.uci.edu/dataset/31/covertype

loanDefaultPrediction

Description

Predict loan default using a decision tree model

Usage

loanDefaultPrediction(loanDataFeatures, loanDataTargets, pct = 0.25)
loanDefaultPrediction(loanDataFeatures, loanDataTargets, pct = 0.25)

Arguments

loanDataFeatures

A matrix of dimension 3 by N, i.e. transposed relative to what R uses, with the three explanantory variables

loanDataTargets

A vector of (integer-valued) binary variables loan repayment or default

pct

A numeric variable with the percentage of data to be used for testing, defaults to 25%

Details

This functions performs a loan default prediction based on three variables on employment, bank balance and annual salary to predict loan repayment or default

Value

A list object with predictions, probabilities, accuracy and a report matrix

Examples

data(loanData)
res <- loanDefaultPrediction(t(as.matrix(loanData[,-4])),  # col 1 to 3, transposed
                             loanData[, 4],                # col 4 is the target
                             0.25)                         # retain 25% for testing
str(res)
res$report
data(loanData)
res <- loanDefaultPrediction(t(as.matrix(loanData[,-4])),  # col 1 to 3, transposed
                             loanData[, 4],                # col 4 is the target
                             0.25)                         # retain 25% for testing
str(res)
res$report

Run logistic regression

Description

Run a logistic regression returning classification

Usage

logisticRegression(data, labels, lambda = 0)
logisticRegression(data, labels, lambda = 0)

Arguments

data

A matrix of data values

labels

A vector of class labels

lambda

An optional L2 regularization parameter, defaults to zero

Details

This function performs a logistic regression on the given data set. The data set is synthetic and follows an on-line example source 2025-10-28 which gave no direct source (as an example provided by Google / Gemini); it is now included in the examples directory of the package

Value

A list with predictions, probabilities and parmeters

Examples

data(logisticRegression)
X <- as.matrix(logisticRegressionData[, 1:2])
y <- as.matrix(logisticRegressionData[, 3])
res <- logisticRegression(X, y)
res$parameters
data(logisticRegression)
X <- as.matrix(logisticRegressionData[, 1:2])
y <- as.matrix(logisticRegressionData[, 3])
res <- logisticRegression(X, y)
res$parameters

Logistic regression example data set

Description

A three column (synthetic) data set to illustrate logistic regression.

Run a Random Forest classificatio

Description

Run a Random Forest Classifier

Usage

randomForest(dataset, labels, pct = 0.3, nclasses = 7L, ntrees = 10L)
randomForest(dataset, labels, pct = 0.3, nclasses = 7L, ntrees = 10L)

Arguments

dataset

A matrix of explanatory variables, i.e. “features”

labels

A vector of the dependent variable as integer values, i.e. “labels”

pct

A numeric value for the percentage of data to be retained for the test set

nclasses

An integer value for the number of a distinct values in labels

ntrees

An integer value for the number of trees

Details

This function performs a Random Forest classification on a subset of the standard ‘covertype’ data set

Value

A list object

Examples

data(covertype_small)                         # see help(covertype_small)
res <- randomForest(covertype_small[-55,],    # features (already transposed)
                    covertype_small[55,],     # labels now in [0, 6] range
                    0.3)                      # percentage used for testing
str(res)  # accuracy varies as method is randomized but no seed set here
data(covertype_small)                         # see help(covertype_small)
res <- randomForest(covertype_small[-55,],    # features (already transposed)
                    covertype_small[55,],     # labels now in [0, 6] range
                    0.3)                      # percentage used for testing
str(res)  # accuracy varies as method is randomized but no seed set here

Package 'rcppmlpackexamples'

Help Index

Example Use of 'mlpack' from C++ via R

Description

Package Content

Maintainer

Author(s)

An AdaBoost classification

Description

Usage

Arguments

Value

Examples

Covertype data subset used for classification

Description

Details

Source

References

Simple example of loading categorical data via 'mlpack'

Description

Usage

Value

Run a decisionTree classification

Description

Usage

Arguments

Value

Examples

Run a Random Forest classificatio

Description

Usage

Arguments

Details

Value

See Also

Examples

Run a k-means clustering analysis

Description

Usage

Arguments

Details

Value

Examples

Run a linear regression with optional ridge regression

Description

Usage

Arguments

Details

Value

Examples

Loan data subset used for default prediction

Description

Details

Source

References

loanDefaultPrediction

Description

Usage

Arguments

Details

Value

Examples

Run logistic regression

Description

Usage

Arguments

Details

Value

Examples

Logistic regression example data set

Description

Run a Random Forest classificatio

Description

Usage

Arguments

Details

Value

See Also

Examples