Arkved's Tower

A Fair Value Model for MMA Prediction Markets

Intro

In my last post, I wrote about why MMA is perhaps the most difficult sport in the world to model.

The short version is that MMA breaks a lot of the assumptions that make traditional sports models work. Fighters do not compete very often, the data is extremely sparse, styles interact in nonlinear ways, fights are state-dependent, and a single mistake can end the entire fight. I also talked about the limitations of various tools and models traditionally used to model sports and other competitive games. Elo is useful, but it compresses skill into one number, which doesn't work for a sport as high-dimensional as MMA. Poisson models are nice, but they assume a kind of stability that MMA doesn't have. Even basic fighter averages can be misleading because the samples are tiny and very opponent-dependent.

So after spending a lot of time thinking about why MMA is hard to model, I was determined to be the first person to actually make a successful betting model.

JDM

Over the last few weeks at the Recurse Center, I have been building an MMA prediction model that estimates the probability of one fighter beating another. My goal is not to predict every fight perfectly, because frankly, that is impossible. Instead, my goal is to build a model that can estimate fair probabilities well enough to compare them against sportsbook odds or prediction market prices.

Essentially, the question I am asking is, "What is the fair probability of a fighter winning, and is the market price meaningfully different from that fair probability?"

For example, if my model thinks Fighter A has a 60% chance to win, and the market also prices Fighter A around 60%, then there isn't really any action to take. The model and the market basically agree, so there isn't any profitable arbitrage that I can exploit. But if my model thinks Fighter A has a 60% chance to win and the market prices them at 45%, then we might have an actionable trading opportunity to make.

If we can make a model that can find even a small proportion of these opportunities, then we can make a viable, profitable trading model for MMA.

The Core Problem

At the most basic level, a fight can be represented as a binary classification problem. For a bout between Fighter A and Fighter B, the target is:

y={1,if Fighter A wins0,if Fighter B wins

The model estimates:

p^=(y=1X)

where X is the information available before the fight.

A very common mistake in sports modeling is to let future information leak into the dataset. In MMA, this can happen if post-fight statistics sneak into the feature set. For example, if a row contains the number of strikes landed in the same fight I am trying to predict, then the model is not predicting the fight, it is cheating by using information that it shouldn't have.

One of the first rules of my pipeline is that every feature has to be something that would have been known before the bout happened. This includes historical metrics such as fighter history and previous fight statistics, as well as physical statistics that aren't subject to variability. Things such as age (date of birth), height, reach, stance, weight class, layoff time, prior wins and losses, and other engineered features based on past fights are all allowed.

I also decided not to use sportsbook odds as input features. Vegas is accurate about 60-65% of the time at best, so if you train a model using sportsbook odds as an input feature, your model is just going to bias sportsbook odds as one of the main predictors for the fight. This effectively defeats the purpose of creating an independent model. For this project, I want the model to estimate probability independently, and then compare that estimate to the market afterward.

I don't want the market price to be an input to the model. The market price is the thing I will eventually want to trade against.

Representing a Fight

The next problem is figuring out how to represent a matchup.

The basic approach would be to build a feature vector for each fighter separately and ask the model to figure out who is better. The problem is, MMA doesn't really work like that. A fighter can be excellent in general and still have a terrible matchup against a specific opponent. For example, A kickboxer with great distance management might look amazing until they face someone who can force Thai clinches and/or takedowns (see Max Holloway vs. Charles Oliveira). A freestyle wrestler might dominate one opponent and then get stuck in a striking match against someone they cannot hold down for 5 rounds (see Khamzat Chimaev vs. Sean Strickland).

So what I decided to do is represent each bout as a matchup between two fighter profiles.

If Fighter A has feature vector xA and Fighter B has feature vector xB, then a simple matchup representation is:

z=xAxB

where z captures the differences between the two fighters.

For example:

age difference=ageAageBreach difference=reachAreachBstriking output difference=SLpMASLpMBtakedown accuracy difference=TDAccATDAccB

This is already better than looking at raw fighter stats in isolation because the model is being asked to evaluate relative advantages, but we still need to go deeper.

Some features matter because of how they interact with each other. A fighter's high striking output might be valuable against one opponent, but dangerous against another. A reach advantage matters more if the fighter can actually maintain distance, or if there's a huge difference in height. A takedown threat matters even when takedowns fail, because it can change how freely the opponent is willing to strike.

I want to model to look something like:

(A beats B)=f(xA,xB)

where f learns not just the difference between fighters, but the way their traits interact.

A more explicit way to interpret this is as follows:

(A beats B)=σ(βT(xAxB)+xATMxB)

The first term:

βT(xAxB)

captures differences between fighter attributes.

The second term:

xATMxB

captures interactions between Fighter A's traits and Fighter B's traits.

This equation is really just a simplified version of the intuition behind the project. A fight is determined by how two multidimensional fighter profiles collide.

Feature Engineering

The model starts with historical fight data and fighter-level statistics, but we need to turn that data into features that actually describe the dynamics of a fight.

Some of the main feature groups I use are:

The goal of engineering these features is to represent both fighter quality and matchup structure. I want the model to know who has been better historically, but I also want it to understand why one fighter's strengths might or might not matter against this specific opponent.

Physical attributes include age, height, reach, weight class, and stance. These are simple features, but they matter more than they might seem at first. Age is especially important because fighter decline is not linear. A 31-year-old and a 33-year-old may not be very different, but a 38-year-old and a 40-year-old can be dramatically different, especially in lighter weight classes where speed, reaction time, and durability matter more.

A basic linear model might assume:

logit(p)=β0+β1·age

This says each additional year of age has the same effect. But age in MMA probably has some kind of threshold effect:

effect(age)={small,if age<35large,if age35

Even still, that is too simple, because heavyweights age differently from flyweights, grapplers age differently from strikers, and defensively responsible fighters age differently from fighters who absorb huge amounts of damage. This is one of the reasons why tree-based models (like Random Forest and XGBoost) are useful for modeling MMA. These kinds of models can learn cutoff points and nonlinear splits without me having to manually specify every single possible threshold and edge case. Instead, the model can learn some of those patterns from the data, assuming the data is good enough.

For striking, raw statistics include significant strikes landed per minute, significant strikes absorbed per minute, striking accuracy, striking defense, knockdowns, and strike differential.

A basic striking differential calculation might be:

strike differential=SLpMSApM

where:

SLpM=significant strikes landed per minute

and:

SApM=significant strikes absorbed per minute

But again, it simply isn't enough to just look at the raw striking averages between two fighters. We need the context behind these stats. A fighter landing 6 significant strikes per minute might be genuinely elite, or they might have fought opponents who were super poor defensively. A fighter absorbing 5 strikes per minute might have bad defense, or they might simply fight at an insane pace.

Because of all the nuances behind these averages, I also try calculating relative striking features:

striking output diff=SLpMASLpMBstriking defense diff=StrDefAStrDefB

I also experimented with pressure-style features. For example:

striking pressure=significant strikes landed×striking accuracy

Then the matchup version becomes:

striking pressure diff=pressureApressureB

While this isn't perfect, the idea is to capture more than just striking volume. A fighter who throws a lot but misses constantly is different from a fighter who throws few punches, but lands efficiently (we like to call these guys snipers). MMA stats are super noisy, and this one feature does not solve striking matchups, but it is a useful way to encode pressure in a way that the model can use to make predictions.

For grappling, raw features include takedown average, takedown accuracy, takedown defense, submission attempts, control time if available, and wrestling-heavy win patterns. A simple grappling pressure feature might look like:

grappling pressure=takedowns attempted or landed+submission attempts

Then:

grappling pressure diff=grappling pressureAgrappling pressureB

The reason I like this kind of feature is that grappling is not just about whether a fighter lands takedowns. The threat of the takedown itself changes the dynamics of the fight. A striker who fears the takedown may throw less freely. A kicker may become hesitant. A fighter with great boxing may suddenly look uncomfortable because they are constantly defending level changes.

This is one of the hardest things to model in MMA. The box score does not always capture the pressure a fighter creates. A fighter may go 1 for 8 on takedowns and still completely change the way their opponent fights, because the opponent spends fifteen minutes worried about being put on their back. It's why a lot of people will cope and say that their favorite striker got outstruck by a wrestler due to "the threat of the takedown."

Another important feature is time since last fight:

days since last fight=fight dateprevious fight date

This is useful because MMA careers are not stationary. A fighter's past statistics will not always describe the current version of that fighter. Someone may look great in their historical data, but if they have not fought in two years, switched camps, moved weight classes, and aged out of their physical prime, the old numbers may not mean as much.

Dynamic Elo

In my previous post, I talked about how Elo is heavily flawed for MMA. But while the system has problems, Elo as a feature is still useful because it gives a compact estimate of historical performance. It's just not good enough to use by itself. So instead of using Elo as the whole model, I use Elo-style ratings as features inside the model.

The standard Elo expected score can be calculated as:

EA=11+10(RBRA)/400

Then the update is:

RA=RA+K(SAEA)

For MMA, I want the rating to be dynamic and pre-fight only. A fighter's Elo before a fight should be calculated using only fights that happened before that date.

That means for every bout, I need:

RA(t)

and:

RB(t)

where t means the rating immediately before the fight at time t.

Then I can create features like:

Elo diff=RA(t)RB(t)

Elo now becomes one signal among many. It can tell me that one fighter has historically beaten stronger competition, but it cannot fully tell me whether their style matches up well against this opponent. So instead of using Elo as the whole model, I feed Elo into a broader machine learning model.

I also experimented with making Elo more MMA-specific. A split decision win over a low-level opponent should not mean the same thing as a dominant knockout over a high-level opponent. There are many ways to modify this, such as method-adjusted Elo, time-decayed Elo, or weighting recent fights more heavily than older fights. You should not overengineer it, but the basic idea is that a fighter's rating should reflect both whether they won and how meaningful that win was.

Categorical Boosting

My current model uses CatBoost, a tree-based model that works well with tabular data and categorical features.

CatBoost

I use CatBoost because MMA data is tabular, messy, nonlinear, and relatively small. These are exactly the kinds of conditions where gradient-boosted tree models tend to perform well.

A linear model tries to learn something like:

logit(p)=β0+β1x1+β2x2+...+βnxn

Now that can be useful, but it assumes a fairly smooth relationship between features and log-odds unless you manually add nonlinearities and interaction terms. MMA is filled with nonlinear interactions.

Gradient-boosted trees are useful because they build an ensemble of decision trees, where each tree partitions the feature space into different regions. For example, a tree might learn that if Fighter A is older than 36, Fighter B is younger than 30, Fighter B has high striking pressure, and Fighter A absorbs a lot of strikes, then Fighter A's win probability should be reduced.

Another branch might learn that if Fighter A has high takedown accuracy, Fighter B has weak takedown defense, and Fighter B has high striking volume, then Fighter A's wrestling threat may be especially valuable.

These are simplified examples, but they capture why tree-based methods are appealing here. The model does not need me to manually specify every possible interaction. It can learn many of them from the data.

CatBoost is also useful because it handles categorical features well, and MMA is filled with categorical labels. The most obvious ones include stance and weight class.

The common way to handle categories is target encoding, where you replace a category with the average outcome for that category. The problem is that if you compute that average using the whole dataset, you can accidentally leak the target into the feature.

For example, imagine encoding a fighter's stance using the average win rate of that stance across the whole dataset. If the current row's outcome is included in that average, then the model has indirectly seen part of the answer. Subtle mistakes like these can lead to models that are catastrophically wrong when making predictions in the real world.

CatBoost tries to avoid this through ordered target statistics. Basically, for a given row, categorical statistics are computed using only earlier rows in a random permutation, instead of using the entire dataset at once:

x^ik=j=1p11[xjk=xik]yj+aPj=1p11[xjk=xik]+a

where P is a prior, a is a smoothing parameter, and only rows before the current row in the permutation are used. Essentially, CatBoost is designed to reduce a kind of target leakage that can easily show up when working with categorical features.

In sports modeling, it is very easy to accidentally build a model that looks good because it has seen information it should not have seen (in fact, this was one of the first mistakes I made when building my initial model). However, the way CatBoost is designed helps prevent these mistakes from happening, and while it doesn't prevent all leakage, it works better than most target-encoded models.

Training Without Time Travel

One of the most important parts of the model development is the train/test split. In many machine learning projects, people use a random train/test split, or use methods such at K-means cross validation. These methods can work when the data is independent and identically distributed.

However, sports data is time-dependent.

If I choose a random train/test split, I might train on future fights and test on earlier fights. This means that the model can indirectly learn information that would not have existed at the time of prediction.

For example, suppose Fighter A fought in 2020, 2022, and 2024. If the model trains on the 2024 fight and tests on the 2022 fight, then it has learned from the fighter's future. This is completely unrealistic, and will only harm the model's ability to predict fights in the present.

Due to the nature of my data being time sensitive, I use a chronological split.

𝒟train=fights before time T

𝒟test=fights after time T

So essentially, given everything I knew before this date, could I have predicted fights after this date?

This is stricter than a random split, but it is much more honest. It also makes backtesting the project more frustrating because the model performance can look worse than it would under a random split. However, I would rather have a model that looks worse on paper than a model that looks great because it future data leaked into the training set.

Model Objective and Calibration

The model outputs a probability:

p^i=(yi=1Xi)

For a binary classifier, the natural loss function is log loss:

Log Loss=1Ni=1N[yilog(p^i)+(1yi)log(1p^i)]

Log loss is useful because it punishes confident wrong predictions. If the model predicts 51% and is wrong, that is not too bad. If the model predicts 95% and is wrong, that is extremely bad. In a trading setting, this matters because confident wrong predictions are exactly how you blow up.

Another metric I care about is the Brier score:

Brier Score=1Ni=1N(p^iyi)2

The Brier score measures the squared error of the predicted probabilities.

Accuracy is useful, but it alone is not good enough to assess how good my model is. A model can have decent accuracy and still be useless for trading if its probabilities are calibrated poorly. If the model says a fighter has a 75% chance to win, that number has to actually mean that the fighter's win probability is close to that value. Otherwise, all of my expected value calculations are garbage.

Calibration means that predicted probabilities correspond to real-world frequencies. If my model says a fighter has a 70% chance to win, then over many fights where the model predicts 70%, the fighter should win about 70% of the time.

Formally:

(Y=1p^=p)=p

A model can be directionally accurate but badly calibrated. For instance, suppose the model is good at knowing which fighter is more likely to win, but it is too aggressive in its predictions. It outputs 80% probabilities for fights that are really closer to 65%. That may still produce decent accuracy overall, but it will make horrible betting decisions because the expected value calculation depends directly on the probability.

The trading edge is:

edge=pmodelpmarket

If pmodel is overconfident, then the edge does not actually exist.

In my experiments, I compare raw model probabilities against calibrated versions. Two common calibration methods are sigmoid calibration and isotonic calibration.

Sigmoid calibration, also called Platt scaling, fits a logistic transformation:

pcalibrated=11+e(a·s+b)

where s is the model score and a and b are learned calibration parameters.

Isotonic calibration is more flexible and fits a monotonic function:

pcalibrated=g(s)

where g is constrained to be non-decreasing.

The tradeoff is that isotonic calibration can adapt to more complex miscalibration, but it can also overfit when the calibration set is small. Because my dataset is limited, this is a valid concern. In this kind of project, the fanciest calibration method is not automatically the best one. A lot of times, the simpler method is more stable because there just is not that much data to work with.

One subtle issue with calibration is that the calibrator should not be trained on predictions from a model that already saw those same examples during training. If I train CatBoost on a set of fights, generate predictions on that same set, and then fit a calibration model on those predictions, the calibration step may learn the model's overfitted confidence instead of its true out-of-sample behavior.

A better approach is to use out-of-fold predictions:

  1. Split the training data into folds.
  2. Train the base model on all but one fold.
  3. Predict probabilities for the held-out fold.
  4. Repeat until every row has an out-of-fold prediction.
  5. Fit the calibrator on those out-of-fold predictions.

That way, the calibrator sees predictions that are closer to what the model would produce on unseen fights.

Conceptually:

p^iOOF=fk(xi)

where fk is the model trained without the fold containing row i.

Then the calibrator learns:

pcalibrated=g(p^OOF)

For a trading model, bad calibration creates fake edge. This is why calibration is not just a cosmetic step; it is central to whether the model can actually be used for markets.

From Predictions to Trades

Once the model outputs probabilities, the next step is to compare those probabilities to market prices.

Something that I had to internalize as I was making this model is was that market prices are not the truth. A prediction market price is really just an aggregate belief. If a contract trades at $0.62, that means the market is roughly implying a 62% probability, but that doesn't necessarily mean that the true probability is 62%. It just means that, given the current market participants, information, risk preferences, and other factors, that the contract is clearing around that price.

When I compare my model probability to the market, I am merely comparing one probability estimate against another probability estimate. It is entirely possible that both my model, and the market, are completely wrong.

However, the market has some advantages that my model does not. For example, the market can aggregate opinions based on sentiment and news. If market participants have information about injuries from one fighter, weigh-in information, good or bad fight camps, and other domain knowledge from other participants. But my model has its own advantages too. My model is systematic, historically tested, not driven by emotion, and consistent across fights.

The raw edge is calculated as:

edge=pmodelpmarket

But the more realistic version is:

true edge=pmodelpmarketϵ

where ϵ represents all the uncertainty I have not modeled like fees, bad data, model error, liquidity, and other hidden information. I don't want to bet every small difference between my model and the market, I just care about the differences that are large enough to survive that error term.

For a prediction market contract priced at c:

edge=pmodelc

For sportsbook odds, I first convert the odds to implied probability, remove the vig, and compare against the model.

In practice, I want something that is closer to this:

pmodelpexecution>fees+slippage+uncertainty buffer

The reason for this is that theoretical edge is not the same thing as executable edge. Lets assume my model says a contract is worth 60 cents and I can buy it at 55 cents. Initially, that looks like a five point edge. But in the market, I still have to care about the bid-ask spread, fees, liquidity, and whether or not I can actually get filled at that price.

That means that the model's edge has to be large enough to survive the actual execution price as opposed to the theoretical market midpoint.

This is where I think it helps to separate the model from the trading layer. The model is supposed to estimate a win probability for a fighter, and the trading layer decides whether or not the probability is useful bsaed on price, fees, uncertainty, and other variables around the estimate.

The trading layer can add a bunch of filters, such as taking bets if the model edge is above a certain threshold, avoiding placing bets on fighters who have little data, staying away from trading when the model probability is too close to 50%, etc.

I think this separation of responsibilities is important because a prediction model can be good for making predictions, but bad for making trades. Maybe it always picks the fighter who wins, but the probabilities are poorly calibrated, or maybe it is calibrated decently, but can't provide a useful trade after fees. The model can give us a probability, but the trading system will decide if the probability is actionable.

Bet Sizing and Risk

Once an edge has been calculated and confirmed, the next question is about how much money to bet.

A common starting point for bet sizing is the Kelly criterion. For a bet with net odds b, model win probability p, and loss probability q=1p, the Kelly fraction is:

f*=bpqb

where f* is the fraction of bankroll to wager.

For example, suppose my model says a fighter has a 55% chance to win at +120 odds. At +120, the net odds are:

b=1.20

The model probability is:

p=0.55

and:

q=1p=0.45

Then:

f*=1.20(0.55)0.451.20f*=0.660.451.20f*=0.175

Full Kelly would suggest that I bet 17.5% of my bankroll, which, quite frankly, is insane. Betting 17.5% of my total bankroll on one fight in a high variance sport like MMA is far too aggressive and irresponsible. The problem is that the Kelly Criterion is optimized for maximum growth, and assumes that the probability estimate is correct. But as I stated previously, my probability estimate is just an estimate, it's not an inherently known/true probability. There are too many factors coming into a fight that can affect the outcome, from bad weight cuts, to judging biases, issues in camp, and other random nonsense.

So the more realistic approach is to use fractional Kelly:

f=αf*

where:

0<α<1

I would also cap maximum position size:

f=min(αf*,fmax)

The purpose of using fractional Kelly and bet size caps is to survive the variance of the bets. I care about the correlated risk of my bets across a UFC card. If my model systematically overrates certain types of fighters, then several of my bets on the same card may fail for the same reason. Even if each individual bet looks like its positive EV, the total exposure may be too concentrated. So I care about the maximum stake per fight, maximum exposure per card, and avoiding too many low-data fights where anything could happen, and I will reduce my position size when my model confidence only comes from a few fragile features.

Paper Trading Results

Before using any real money to place bets, I wanted to paper trade across a few UFC events.

Paper trading is when you simulate bets using model predictions and market odds without actually placing the trades. For each fight, you generate a model probability, get the market-implied probability, remove the vig or use the executable prediction market price, calculate the edge, apply filters, size your bet, and record the result. From there you can track your bankroll, ROI, drawdown, and your model calibration. I care about whether my model is picking the right fighters to win, but more importantly, I care about whether the model would have made good trades.

For prediction markets, if I buy a contract at price c and it resolves to 1:

profit=1c

If it resolves to 0:

profit=c

Over many trades, I can track:

ROI=total profittotal amount risked

I did paper trading across the months of April - May 2026, and got some pretty encouraging results. Using the model's predicted probabilities, comparing them against available market prices, and only taking bets where the edge cleared my threshold, the simulated bankroll generated roughly 50% returns over that period.

That number is obviously very exciting, especially after only 4 weeks worth of events, but I don't want to overstate it. A 50% return over a short time period does not prove that the model has permanent edge. MMA is very high variance, and a few correct underdog predictions made by my model can inflate my short-term results massively.

So my paper trading results are not so much definitive proof, but rather evidence that my approach is worth continuing. My model was actually able to find spots where the estimated probabilities differed meaningfully from the market's implied probability, and finding mispriced probabilities is what quantitative trading and sports betting is all about.

UFC 328: A Live Example

The first time this project felt especially real was UFC 328: Strickland vs. Chimaev. The event was held at the Prudential Center in Newark, NJ. As a New Jersey resident and UFC fan, I absolutely felt obligated to attend. However, this time, I decided to actually place some bets using my model.

Faceoff

For that card, my model found what it thought was a strong pricing on the main event underdog, Sean Strickland. The market price implied roughly a 10% probability to win, whereas my model predicted a 30% chance or so for Strickland. My model's fair probability was meaningfully higher, and that made it exactly the kind of spot that I was looking for. I didn't think Strickland was going to win the fight, but the market value of the fighter is underpriced relative to my model.

I ended up spending about $58 betting on Sean Strickland, and somehow, the madman won by split decision, and I profited $492 from that one bet alone. I also placed bets on some other fights on the card using my model's recommendations. In total, I staked around 398 dollars, and my positions netted me roughly $1150 in profit.

Win

Obviously, one event doesn't prove anything by itself, but it was still a useful experience because it showed the full pipeline of my model working from end-to-end. My model was able to take a fight, make a prediction about who would win (It still picked Chimaev to win), but it identified the market prices as being wrong, and recommended a trade to me which had a clear reason behind it before the outcome happened. And this process was repeated across the whol card to find mispriced probabilities.

Next Steps and Caveats

Although my model worked, it has not solved MMA betting. The sport is still incredibly volatile and young, and there are many obvious limitations like the small datasets, hidden injuries that fighters hide, bad weight cuts, judging variance, etc. A CatBoost model trained on historical UFC fight data won't magically know everything happening behind the scenes.

That being said, I think the early results of my betting model are promising enough to keep building and expanding upon. The next steps are too improve the feature engineering, especially opponent-adjusted stats, and to build better market tooling around prediction market price, order books, spreads, and liquidity. I also want to keep paper trading across future cards to see whether the model is actually finding repeatable mispricings in the market, or if its just producing plausible looking probabilities.

Conclusion

MMA is a very strange and tough modeling problem because it has just enough data and structure to tempt people into betting, but is volatile enough to be dangerous and blow up your bankroll.

The early results of my betting experiments were very encouraging. Across the months of April and May 2026 the model generated roughly 50% simulated returns under my betting rules. On UFC 328, it identified a variety of strong trades, where I put around $398 at risk and made nearly $1150 in profit.

These results haven't solved the combat sports betting problem, but it is enough to decide that the project is worth continuing. Who knows, maybe I'll build the best betting model in the history of MMA.