What is random forest?
Random Forest is a popular and versatile machine learning method that is capable of performing both regression and classification tasks. It is a type of ensemble learning method, where a group of weak models combine to form a strong model. In Random Forest, the weak models are decision trees.
Here's a brief overview of how Random Forest works:
Bootstrap Data: Random
Forest starts by selecting random samples from the dataset. This is done with
replacement, meaning the same sample can be chosen multiple times. This process
is known as bootstrapping.
Build Decision Trees: For
each bootstrap sample, a decision tree is built. At each node of the tree, a
random subset of features is chosen to decide the best split. This randomness
in feature selection adds to the "randomness" of the Random Forest.
Make Predictions: For a
classification problem, each tree in the forest gives a "vote" for
the class, and the class with the most votes is the prediction of the Random
Forest. For a regression problem, the average prediction of all the trees is
the prediction of the Random Forest.
The main advantages of Random
Forest are that it can model complex interactions between features, it doesn't
require feature scaling, and it's less likely to overfit than a single decision
tree because it averages the predictions of many decision trees.
Example
of RandomForestClassifier
from
sklearn.ensemble import RandomForestClassifier
from sklearn.datasets
import load_iris
from
sklearn.model_selection import train_test_split
from
sklearn.metrics import accuracy_score
# Load the
iris dataset
iris =
load_iris()
X =
iris.data
y =
iris.target
# Split the
data into training and test sets
X_train, X_test,
y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize
the Random Forest classifier
rf =
RandomForestClassifier(n_estimators=100, random_state=42)
# Fit the
model on the training data
rf.fit(X_train,
y_train)
# Make
predictions on the test data
predictions
= rf.predict(X_test)
# Calculate
the accuracy of the model
accuracy =
accuracy_score(y_test, predictions)
# Print the
accuracy
print(f'Accuracy:
{accuracy}')
Comments
Post a Comment