How Our AI Portfolio Optimizer Uses Machine Learning to Build Smarter Portfolios

Most portfolio tools ask you to pick stocks and then tell you how risky your choices are. That is backwards. The interesting question is not "how risky is this basket?" but "given a universe of 200 stocks, which 12 should I own, how much of each, and why?"

That is what the AI Portfolio Optimizer does. It uses a machine learning model trained on real market data to predict forward returns, then combines those predictions with risk management, correlation filtering, and position sizing to build complete portfolios tailored to your risk tolerance.

This article explains how each piece works, why certain design choices were made, and what the system can and cannot do.

The Problem With Equal-Weight Portfolios¶

The simplest portfolio strategy is to pick a handful of stocks and split your money evenly across them. It is easy to understand and easy to implement. It is also leaving performance on the table.

Equal weighting ignores everything we know about individual stocks. A high-conviction pick with low volatility gets the same allocation as a speculative name with twice the risk. Two stocks that move in lockstep both get full weight, doubling your exposure to a single risk factor without any diversification benefit.

The AI Portfolio Optimizer solves this by making allocation decisions based on predicted returns, measured risk, and how stocks move relative to each other. The result is a portfolio where every position earns its place.

Step 1: Predicting Forward Returns With XGBoost¶

At the core of the system is an XGBoost regression model trained to predict 30-day forward stock returns.

XGBoost is a gradient boosted decision tree algorithm. It builds an ensemble of 300 individual decision trees, where each new tree learns from the mistakes of the previous ones. The model was chosen for its track record in tabular data competitions and its resistance to overfitting when properly tuned.

What the Model Sees¶

The model uses 19 features for each stock, capturing price momentum, volatility, technical signals, and market structure:

Price returns over 1, 5, 20, and 60 days (short to medium-term momentum)
30-day rolling volatility (current risk regime)
RSI (momentum oscillator, overbought/oversold conditions)
Moving averages at 20 and 50 days, plus their ratio to current price
Volume ratio (current volume vs average, detecting unusual activity)
Price position relative to the 52-week high and low (where are we in the range?)

These features are simple, interpretable, and available for any liquid stock. No alternative data, no sentiment feeds, no black box inputs.

Training on 42 Stocks¶

The model trains on a curated set of 42 stocks spanning sectors and market caps, from mega-cap names like AAPL and MSFT to mid-caps like CRWD and SNOW, plus international holdings like L'Oreal (OR.PA) and SAP (SAP.DE). The training window covers roughly 11 months of daily data, giving the model around 9,300 samples to learn from.

The data is split three ways:

60% training set (5,578 samples): The model learns patterns here
20% test set (1,860 samples): Used for early stopping to prevent overfitting
20% hold-out set (1,860 samples): Never touched during training, used only for final evaluation

This three-way split is critical. The hold-out set provides an unbiased estimate of real-world performance because the model has never seen any of that data during training or tuning.

Honest Performance Numbers¶

Let us be upfront about what the model can and cannot do.

Individual stock prediction accuracy:

R² Score: 0.037. That is weak. It means the model explains about 3.7% of the variance in 30-day returns.
Directional Accuracy: 63.2%. Better. The model correctly predicts whether a stock will go up or down about 63% of the time.
Average return on long picks: +2.57% per 30 days.

An R² of 0.037 looks underwhelming in isolation, but it is actually realistic for stock prediction. Academic research consistently shows that even the best models rarely achieve R² above 0.05 on forward returns. Markets are noisy. The edge is small.

The key insight is that a small edge, applied consistently across a diversified portfolio with proper risk management, compounds into meaningful outperformance. The model does not need to predict individual stocks perfectly. It needs to be right slightly more often than wrong, and size positions appropriately when it is right.

Portfolio-level performance on the hold-out set:

AI Portfolio return: +2.79% over 30 days
Equal-weight baseline: +2.72%
SPY benchmark: +2.59%

The model beats both baselines. Not by a huge margin on any single period, but consistently enough to matter over time.

Step 2: Measuring Risk Beyond Volatility¶

Most risk models equate risk with volatility. A stock that swings 3% daily is considered riskier than one that moves 1%. But this misses something important: direction matters.

A stock that goes up 3% on most days and down 1% on bad days is not the same risk as a stock that drops 3% regularly and only recovers 1%. Traditional volatility treats both identically.

The AI Portfolio Optimizer uses a directional risk model that separates upside from downside:

Directional bias (50% of risk score): How often does the stock go down versus up? A stock that declines 55% of trading days carries more risk than one that declines 45% of trading days, even if their volatility is identical.
Downside severity (30% of risk score): When the stock does go down, how bad is it? This captures tail risk, the difference between a stock that dips 0.5% on bad days versus one that drops 3%.
Asymmetry penalty (20% of risk score): Are the losses bigger than the gains? If average down days are -2% and average up days are +1%, the math works against you regardless of win rate.

Liquidity risk is layered on top. A micro-cap stock trading 50,000 shares per day carries more execution risk than a mega-cap trading 20 million shares, and the risk score reflects that.

The final composite score classifies each stock as LOW (0-33), AVERAGE (33-66), or HIGH (66-100) risk, giving the portfolio builder a clear signal about how much weight to assign.

Step 3: Correlation Filtering¶

Diversification only works if your holdings actually move independently. Owning ten tech stocks is not diversification, it is concentration with extra transaction costs.

The optimizer enforces a hard constraint: no two stocks in the final portfolio can have a pairwise correlation above 0.65.

The algorithm works like this:

Rank all candidate stocks by a priority score that combines ML predicted return and risk adjustment
Select the highest-priority stock
For each remaining candidate, check its correlation against every stock already in the portfolio
Only add the candidate if all pairwise correlations are at or below 0.65
If it fails the correlation check, skip it and try the next candidate
Continue until the portfolio reaches the target size (typically 12 stocks)

This means the optimizer will reject a high-conviction pick if it moves too closely with something already in the portfolio. You might miss the second-best tech stock, but you avoid the scenario where a single sector rotation wipes out half your positions.

Step 4: Position Sizing With Fractional Kelly¶

Once the optimizer has selected 12 stocks that pass the correlation filter, it needs to decide how much capital to allocate to each one. This is where the Kelly Criterion comes in.

The Kelly formula calculates the mathematically optimal bet size to maximize long-term portfolio growth:

f* = (p x b - q) / b

Where p is the probability of a positive return, b is the ratio of average wins to average losses, and q is the probability of a negative return.

Full Kelly sizing is theoretically optimal but practically aggressive. It leads to large concentrated positions that can produce severe drawdowns. The optimizer uses fractional Kelly at 25%, meaning it takes one quarter of the Kelly-recommended position size. This sacrifices some expected return for significantly smoother results.

Position sizes are also capped at 25% maximum per stock. No single position can dominate the portfolio regardless of how confident the model is.

Step 5: Core Stock Preference¶

Rather than locking a fixed percentage of the portfolio into predetermined stocks, the optimizer treats core holdings as preferred candidates that must earn their place.

Each risk level defines a set of stability-oriented stocks that receive a priority boost during both selection and position sizing. These stocks are more likely to appear in the final portfolio, but the ML model can reduce their weight or drop them entirely if predictions are unfavorable.

Conservative (prefers ~55% in stability stocks):
SPY, Johnson & Johnson, Procter & Gamble, Microsoft, Berkshire Hathaway

Moderate (prefers ~45% in stability stocks):
QQQ, Johnson & Johnson, Microsoft, Apple

Aggressive (prefers ~35% in stability stocks):
QQQ, Nvidia, Microsoft, Apple

The priority boost works at two levels. In the correlation filter, core stocks get a 1.5x multiplier on their priority score, making them more likely to be selected ahead of other candidates. In position sizing, their Kelly fraction gets a 1.3x boost, nudging more capital toward them when the math supports it.

The key difference from a hard allocation floor: if the model predicts Microsoft will underperform over the next 30 days, it can reduce that position or replace it with something better. Every stock in the portfolio earns its weight through the same ML-driven process. Core stocks start with an advantage, but they do not get a free pass.

The Stock Universe¶

The optimizer selects from approximately 200 pre-screened stocks. Every stock in the universe passes these quality filters:

Price above $3 (no penny stocks)
Daily volume above 100,000 shares (sufficient liquidity for execution)
Market cap above $500 million (avoids micro-caps with unreliable data)
At least one year of price history (enough data for meaningful analysis)
Correlation with SPY below 0.95 (avoids near-duplicate broad market exposure)

The universe spans sectors: 21% Technology, 15% Consumer Discretionary, 13% Healthcare, 13% Industrials, 12% Finance, and smaller allocations across Energy, Utilities, Real Estate, and others.

It also includes international stocks: L'Oreal, SAP, Siemens, Airbus, LVMH, Roche, and several Canadian names. Geographic diversification adds another layer of protection against single-country risk.

Three Risk Profiles¶

The optimizer generates portfolios for three risk levels. Each adjusts the core allocation, position sizing aggressiveness, and target return profile:

Conservative: Targets 8-10% annual returns with volatility under 15%. Five core stocks make up 55% of the portfolio. Suitable for capital preservation with modest growth.

Moderate: Targets 12-15% annual returns with 18-22% volatility. Four core stocks at 45% allocation. Balances growth and stability.

Aggressive: Targets 18-25% annual returns with 25-35% volatility. Four growth-oriented core stocks at 35% allocation. Accepts higher drawdowns for higher expected returns.

How the Model Stays Current¶

Markets change. A model trained on bull market data may not work in a correction. The optimizer handles this through intelligent conditional retraining rather than rigid schedules.

Weekly data updates (10 seconds): The system adds the latest 7 days of market data and drops the oldest 7 days, maintaining a rolling 2-year training window of about 21,000 data points.

Weekly performance monitoring (20 seconds): The model's recent predictions are compared against actual results. Did the AI portfolio beat SPY? Did it beat the equal-weight baseline?

Conditional retraining (90 seconds, only when needed): If the model underperforms benchmarks for two consecutive weeks, it retrains on the updated dataset. If it is still beating benchmarks, the existing model stays in production.

This approach reduces unnecessary retraining by about 80%. Instead of retraining every week regardless of performance, the system only retrains when there is evidence that the current model has degraded. It also cuts API calls by 98.6% compared to fetching fresh data for every retraining cycle, since updated data is stored in the database.

What You See in the App¶

When you open the AI Portfolio Optimizer on StockIceberg, the workflow is simple:

Choose your risk level (Conservative, Moderate, or Aggressive)
Set your capital ($1,000 to $1,000,000)
Adjust the number of stocks (8 to 15, default 12)
Click Generate

The system returns a complete portfolio with:

Summary metrics: Expected return, volatility, Sharpe ratio
Position table: Every stock with its weight, dollar allocation, share count, risk level, expected return, and sector
Allocation chart: Visual breakdown of position weights
Risk distribution: How many positions are LOW, AVERAGE, and HIGH risk
Sector distribution: Ensuring you are not accidentally concentrated
Risk-return scatter: Each position plotted by expected return vs volatility, with bubble size proportional to weight

You can export the portfolio as CSV or JSON for further analysis or to use in other tools.

What This System Does Not Do¶

Transparency matters, so here is what the optimizer does not claim:

It does not guarantee returns. The model's edge is small and statistical. Some 30-day periods will underperform. The advantage shows up over many periods, not every period.

It does not account for transaction costs. Spreads, commissions, and slippage are not modeled. For liquid large-cap stocks these costs are small, but they are not zero.

It does not rebalance automatically. The optimizer generates a portfolio at a point in time. As prices move, actual weights will drift from targets. Periodic rebalancing, which is on the roadmap, would improve results.

It was trained during a specific market regime. The current model learned primarily from recent market conditions. A prolonged bear market or a regime shift could require different features or model architecture.

It is not investment advice. The AI Portfolio Optimizer is an analytical tool. It processes data and applies algorithms. It does not know your financial situation, goals, tax circumstances, or risk capacity. Use it as one input among many.

Why Portfolio-Level Thinking Matters¶

The most important lesson from building this system is that individual stock prediction is nearly impossible, but portfolio-level outperformance is achievable.

An R² of 0.037 on individual stocks sounds useless. But combine a slightly-better-than-random predictor with correlation filtering that ensures true diversification, position sizing that allocates more to higher-conviction picks, core stocks that prevent catastrophic drawdowns, and risk assessment that penalizes downside asymmetry, and the whole becomes meaningfully greater than the sum of its parts.

That is the real value of the optimizer. Not any single prediction, but the disciplined combination of many small edges into a coherent portfolio.

Getting Started¶

Head to the AI Portfolio Optimizer page on StockIceberg. Choose your risk level, set your capital, and generate your first portfolio. Compare the suggested allocation against what you currently hold. Look at the correlation structure, the sector distribution, the risk breakdown.

You do not have to follow the portfolio exactly. Use it as a starting point, a data-driven second opinion on how your capital could be allocated. Then adjust based on your own knowledge, convictions, and constraints.

The AI Portfolio Optimizer is an analytical tool for educational purposes. Past performance and backtesting results do not guarantee future returns. All investment decisions carry risk, including the potential loss of principal. Always consult with a qualified financial advisor before making investment decisions.