| Title: | Example Use of 'mlpack' from C++ via R |
|---|---|
| Description: | A Minimal Example Package which demonstrates 'mlpack' use via C++ Code from R. |
| Authors: | Dirk Eddelbuettel [aut, cre], Authors of mlpack [aut], Constantinos Giachalis [ctb] |
| Maintainer: | Dirk Eddelbuettel <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.0.1.1 |
| Built: | 2026-05-27 14:36:08 UTC |
| Source: | https://github.com/eddelbuettel/rcppmlpack-examples |
A Minimal Example Package which demonstrates 'mlpack' use via C++ Code from R.
Index of help topics:
adaBoost An AdaBoost classification
covertype_small Covertype data subset used for classification
datasetExample Simple example of loading categorical data via
'mlpack'
decisionTree Run a decisionTree classification
kMeans Run a k-means clustering analysis
linearRegression Run a linear regression with optional ridge
regression
loanData Loan data subset used for default prediction
loanDefaultPrediction loanDefaultPrediction
logisticRegression Run logistic regression
logisticRegressionData
Logistic regression example data set
randomForest Run a Random Forest classificatio
rcppmlpackexamples-package
Example Use of 'mlpack' from C++ via R
Dirk Eddelbuettel <[email protected]>
Dirk Eddelbuettel [aut, cre], Authors of mlpack [aut], Constantinos Giachalis [ctb]
Run AdaBoost using a simple Perceptron model as the weak learner
adaBoost(dataset, labels, iterations = 100L, tolerance = 2e-10, perceptronIter = 400L)adaBoost(dataset, labels, iterations = 100L, tolerance = 2e-10, perceptronIter = 400L)
dataset |
A matrix of explanatory variables, i.e. “features” |
labels |
A vector of the dependent variable as integer values, i.e. “labels” |
iterations |
An integer value for the number of iterations |
tolerance |
A double with the desired tolerance |
perceptronIter |
An integer value for the number of a iterations for the weak learner |
A list object
data(iris) X <- t(as.matrix(iris[,1:4])) y <- as.integer(iris[,5]) - 1 # mlpack prefers {0, 1, 2} adaBoost(X, y)data(iris) X <- t(as.matrix(iris[,1:4])) y <- as.integer(iris[,5]) - 1 # mlpack prefers {0, 1, 2} adaBoost(X, y)
A subset of the UCI machine learning data set ‘covertype’ describing cloud coverage in seven different states of coverage. This smaller subset contains with 100,000 observations and 55 variables. The first 54 variables are explanatory (i.e. “features”), with the last providing the dependent variable (“labels”. The data is in the ‘wide’ 55 x 100,000 format used by mlpack. The dependent variable has been transformed to the range zero to six by subtracting one from the values found in the data file.
The original source of the data is the US Forest Service, and the complete file is part of the UC Irvince machine learning data repository.
https://www.mlpack.org/datasets/covertype-small.csv.gz
https://archive.ics.uci.edu/dataset/31/covertype
Simple example of loading categorical data via 'mlpack'
datasetExample()datasetExample()
Nothing is returned, the function is invoked for its side effect.
Run decisionTree classifier
decisionTree(dataset, labels, pct = 0.3, min_leaf_size = 10L, minimum_gain_split = 1e-07, maximum_depth = 0L)decisionTree(dataset, labels, pct = 0.3, min_leaf_size = 10L, minimum_gain_split = 1e-07, maximum_depth = 0L)
dataset |
A matrix of explanatory variables, i.e. “features” |
labels |
A vector of the dependent variable as integer values, i.e. “labels” |
pct |
A numeric value for the percentage of data to be retained for the test set |
min_leaf_size |
An integer value with the minimum number of elements per leaf |
minimum_gain_split |
A double with the gain needed to further split the tree |
maximum_depth |
An integer with the maximum tree depth, default zero means unlimited |
A list object
data(iris) X <- t(as.matrix(iris[,1:4])) y <- as.integer(iris[,5]) - 1 # mlpack prefers {0, 1, 2} decisionTree(X, y)data(iris) X <- t(as.matrix(iris[,1:4])) y <- as.integer(iris[,5]) - 1 # mlpack prefers {0, 1, 2} decisionTree(X, y)
Run a k-means clustering analysis, returning a list of cluster assignments
kMeans(data, clusters)kMeans(data, clusters)
data |
A matrix of data values |
clusters |
An integer specifying the number of clusters |
This function performs a k-means clustering analysis on the given data set.
A list with cluster assignments
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) colnames(x) <- c("x", "y") cl <- kMeans(x, 2) data(trees, package="datasets") cl2 <- kMeans(t(trees),3)x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) colnames(x) <- c("x", "y") cl <- kMeans(x, 2) data(trees, package="datasets") cl2 <- kMeans(t(trees),3)
Run a linear regression (with optional ridge regression)
linearRegression(matX, vecY, lambda = 0, intercept = TRUE)linearRegression(matX, vecY, lambda = 0, intercept = TRUE)
matX |
A matrix of explanatory variables (‘predictors’) in standard R format (i.e. ‘tall and skinny’ to be transposed internally to MLPACK format (i.e. ‘short and wide’). |
vecY |
A vector of dependent variables (‘responses’) |
lambda |
An optional ridge parameter, defaults to zero |
intercept |
An optional boolean switch about an intercept, default is true. |
This function performs a linear regression, and serves as a simple test case for accessing an MLPACK function.
A vector with fitted values
suppressMessages(library(utils)) data("trees", package="datasets") X <- with(trees, cbind(log(Girth), log(Height))) y <- with(trees, log(Volume)) lmfit <- lm(y ~ X) # summary(fitted(lmfit)) mlfit <- linearRegression(X, y) # summary(mlfit) all.equal(unname(fitted(lmfit)), as.vector(mlfit))suppressMessages(library(utils)) data("trees", package="datasets") X <- with(trees, cbind(log(Girth), log(Height))) y <- with(trees, log(Volume)) lmfit <- lm(y ~ X) # summary(fitted(lmfit)) mlfit <- linearRegression(X, y) # summary(mlfit) all.equal(unname(fitted(lmfit)), as.vector(mlfit))
A four column data set containing a binary variable ‘Employed’ (with zero denoting unemployment and one employment), a numeric variable ‘Bank Balance’, a numeric variable ‘Annual Salary’ and a binary target variable ‘Defaulted?’ (with zero denoting loan repayment and one denoting default).
The original source of the data is not documented by mlpack.
https://datasets.mlpack.org/LoanDefault.csv
https://archive.ics.uci.edu/dataset/31/covertype
Predict loan default using a decision tree model
loanDefaultPrediction(loanDataFeatures, loanDataTargets, pct = 0.25)loanDefaultPrediction(loanDataFeatures, loanDataTargets, pct = 0.25)
loanDataFeatures |
A matrix of dimension 3 by N, i.e. transposed relative to what R uses, with the three explanantory variables |
loanDataTargets |
A vector of (integer-valued) binary variables loan repayment or default |
pct |
A numeric variable with the percentage of data to be used for testing, defaults to 25% |
This functions performs a loan default prediction based on three variables on employment, bank balance and annual salary to predict loan repayment or default
A list object with predictions, probabilities, accuracy and a report matrix
data(loanData) res <- loanDefaultPrediction(t(as.matrix(loanData[,-4])), # col 1 to 3, transposed loanData[, 4], # col 4 is the target 0.25) # retain 25% for testing str(res) res$reportdata(loanData) res <- loanDefaultPrediction(t(as.matrix(loanData[,-4])), # col 1 to 3, transposed loanData[, 4], # col 4 is the target 0.25) # retain 25% for testing str(res) res$report
Run a logistic regression returning classification
logisticRegression(data, labels, lambda = 0)logisticRegression(data, labels, lambda = 0)
data |
A matrix of data values |
labels |
A vector of class labels |
lambda |
An optional L2 regularization parameter, defaults to zero |
This function performs a logistic regression on the given data set. The data set
is synthetic and follows an on-line example source 2025-10-28 which gave no direct
source (as an example provided by Google / Gemini); it is now included in the
examples directory of the package
A list with predictions, probabilities and parmeters
data(logisticRegression) X <- as.matrix(logisticRegressionData[, 1:2]) y <- as.matrix(logisticRegressionData[, 3]) res <- logisticRegression(X, y) res$parametersdata(logisticRegression) X <- as.matrix(logisticRegressionData[, 1:2]) y <- as.matrix(logisticRegressionData[, 3]) res <- logisticRegression(X, y) res$parameters
A three column (synthetic) data set to illustrate logistic regression.
Run a Random Forest Classifier
randomForest(dataset, labels, pct = 0.3, nclasses = 7L, ntrees = 10L)randomForest(dataset, labels, pct = 0.3, nclasses = 7L, ntrees = 10L)
dataset |
A matrix of explanatory variables, i.e. “features” |
labels |
A vector of the dependent variable as integer values, i.e. “labels” |
pct |
A numeric value for the percentage of data to be retained for the test set |
nclasses |
An integer value for the number of a distinct values in |
ntrees |
An integer value for the number of trees |
This function performs a Random Forest classification on a subset of the standard ‘covertype’ data set
A list object
covertype_small
data(covertype_small) # see help(covertype_small) res <- randomForest(covertype_small[-55,], # features (already transposed) covertype_small[55,], # labels now in [0, 6] range 0.3) # percentage used for testing str(res) # accuracy varies as method is randomized but no seed set heredata(covertype_small) # see help(covertype_small) res <- randomForest(covertype_small[-55,], # features (already transposed) covertype_small[55,], # labels now in [0, 6] range 0.3) # percentage used for testing str(res) # accuracy varies as method is randomized but no seed set here