学ぶ LightGBM | Framework Deep Dive

セクション 2. 章 2

single

メニューを表示するにはスワイプしてください

LightGBM is a gradient boosting framework that stands out for its unique approach to tree construction and feature handling. Two of its core innovations—histogram binning and leaf-wise tree growth—are central to its reputation for high speed and efficiency, especially on large datasets.

Histogram binning

Discretizes continuous feature values into a fixed number of bins before training;
Groups feature values into these bins, reducing the number of split candidates during tree construction;
Speeds up computation and reduces memory usage, since raw feature data can be stored more compactly as bin indices.

Leaf-wise tree growth

Always splits the leaf with the maximum loss reduction, regardless of its depth;
Differs from traditional level-wise algorithms that grow all leaves at the same depth in parallel;
Also known as "best-first" or "leaf-wise" growth;
Can produce deeper, more complex trees that capture intricate patterns in the data;
Boosts accuracy, but may increase the risk of overfitting on smaller datasets—LightGBM provides parameters to control tree complexity.

Together, histogram binning and leaf-wise growth allow LightGBM to train much faster and with a lower memory footprint than many other gradient boosting frameworks, particularly when handling large, high-dimensional datasets.


              12345678910111213141516171819202122232425262728293031323334353637
            
import time
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier

# Generate a synthetic dataset
X, y = make_classification(
    n_samples=20000,
    n_features=50,
    n_informative=30,
    n_redundant=10,
    n_classes=2,
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Initialize LightGBM classifier
lgbm = LGBMClassifier(
    n_estimators=100,
    max_depth=8,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42
)

# Time the training process
start_time = time.time()
lgbm.fit(X_train, y_train)
end_time = time.time()

fit_time = end_time - start_time
print("LightGBM fit time (seconds):", fit_time)

Note

Compared to XGBoost, LightGBM's histogram-based binning and leaf-wise tree growth typically result in faster training times and lower memory consumption when using similar hyperparameters. While XGBoost uses a level-wise tree growth strategy and can be slower on large, high-dimensional datasets, LightGBM's optimizations allow it to process data more efficiently. However, the actual speed and memory advantage may depend on dataset characteristics and parameter settings.

タスク

スワイプしてコーディングを開始

You are given a synthetic binary classification dataset. Your task is to:

Load and split the data.
Initialize a LightGBM classifier with parameters:
- n_estimators=150.
- learning_rate=0.05.
- max_depth=6.
- subsample=0.8.
- colsample_bytree=0.8.
Train the model and obtain predictions on the test set.
Compute accuracy and store it in accuracy_value.
Print the shapes of the datasets and the final accuracy.

解答

実践的な練習のためにデスクトップに切り替える下記のオプションのいずれかを利用して、現在の場所から続行する

すべて明確でしたか？

フィードバックありがとうございます！

セクション 2. 章 2

single

AIに質問する

何でも質問するか、提案された質問の1つを試してチャットを始めてください