0
0
ML Pythonml~5 mins

CatBoost in ML Python

Choose your learning style9 modes available
Introduction
CatBoost is a tool that helps computers learn from data to make good guesses, especially when the data has categories like colors or types.
When you have data with categories like 'red', 'blue', or 'green' and want to predict something.
When you want a fast and easy way to build a model without much tuning.
When you want to avoid complicated data preparation for categorical data.
When you want good accuracy on tabular data like spreadsheets.
When you want to handle missing data automatically.
Syntax
ML Python
from catboost import CatBoostClassifier

model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6)
model.fit(X_train, y_train, cat_features=cat_features)
predictions = model.predict(X_test)
Use CatBoostClassifier for classification tasks and CatBoostRegressor for regression tasks.
Specify categorical feature indices in 'cat_features' to let CatBoost handle them properly.
Examples
Basic example with default settings and no categorical features specified.
ML Python
from catboost import CatBoostClassifier

model = CatBoostClassifier(iterations=50)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Example specifying categorical features at columns 0 and 2 with custom parameters.
ML Python
model = CatBoostClassifier(iterations=200, learning_rate=0.05, depth=8)
model.fit(X_train, y_train, cat_features=[0, 2])
Example with silent training by setting verbose=0.
ML Python
model = CatBoostClassifier()
model.fit(X_train, y_train, cat_features=cat_features, verbose=0)
Sample Model
This example shows how to use CatBoost to classify data with a categorical feature 'color'. It trains the model and prints accuracy and predictions.
ML Python
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd

# Sample data with categorical and numeric features
data = pd.DataFrame({
    'color': ['red', 'green', 'blue', 'green', 'red', 'blue', 'green', 'red'],
    'size': [1, 2, 3, 2, 1, 3, 2, 1],
    'weight': [10, 20, 30, 20, 10, 30, 20, 10],
    'label': [0, 1, 0, 1, 0, 0, 1, 0]
})

# Features and target
X = data[['color', 'size', 'weight']]
y = data['label']

# Convert categorical feature to category dtype
X['color'] = X['color'].astype('category')

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Indices of categorical features
cat_features = [0]

# Create and train model
model = CatBoostClassifier(iterations=50, learning_rate=0.1, depth=4, verbose=0)
model.fit(X_train, y_train, cat_features=cat_features)

# Predict
preds = model.predict(X_test)

# Accuracy
acc = accuracy_score(y_test, preds)
print(f"Accuracy: {acc:.2f}")
print(f"Predictions: {preds.tolist()}")
OutputSuccess
Important Notes
CatBoost automatically handles categorical features without needing to convert them to numbers.
You can control training verbosity with the 'verbose' parameter.
CatBoost works well even with small datasets and missing values.
Summary
CatBoost is a powerful tool for handling categorical data in machine learning.
It requires minimal data preparation and gives good results quickly.
Use CatBoostClassifier for classification and specify categorical features for best performance.