Warrior BiB not simming as BiB

prime · February 2, 2020, 11:39pm

It suggest vers weapon enchant and gem, but if i lock in crit for gems/enchant it sims higher. All while using no azerite or gem threshold, and never surpassing my 39 corruption cap. Though i noticed that when i lock in crit chants/gems it does also change some items it recommends. I assume this is some type of bug as it should select the actual best combo of everything without me having 2 lock in items, right?

Swol · February 3, 2020, 2:01am

So, equipping the Igneous Winterskorn Loop results in a DPS increase. It doesn’t really matter if you lock in any gems/enchants. You could simulate with the versatility enchant on one of your weapons and still get the same DPS.

The optimizer is having a hard time finding the combo with both the corrupted rings. We can take a look and see if there is a way to tweak it to go for that combo. When you’re dealing with multiple stat procs on gear, it gets really hard to predict the simulated DPS as closely as when you’re dealing with static stats.

prime · February 3, 2020, 2:40am

well technically it does sim higher, but yea i guess it was the ring making the big difference. Simmed with the Igneous Winterskorn Loop locked and the rest of the suggest settings, then simmed the same gear just with crit/haste weapon enchant rather then the vers one, and it came out higher, though only slightly.

https://www.askmrrobot.com/wow/simulator/report/3b3819b56e1b44c086c7a721213999f8
https://www.askmrrobot.com/wow/simulator/report/ed9c985632d44476acbba2ae533ab64a

Swol · February 3, 2020, 4:46am

A difference that small is statistically insignificant. There’s actually a formula you can use to show those two results don’t have a statistically significant difference. The optimizer would be working correctly if it picked either of those sets of gear.

Running very small amounts of data points (in this case 2) at a very small margin of error provides little insight. We gather large sets of data and then analyze that data and smooth out the noise to be able to find good sets of gear. Unless there is a large, statistically significant discrepancy between what the optimizer picks and the actual simulation results, the optimizer is doing a good job.

Even the 1k difference from not picking the two ring combo is really on the edges of what can be expected of a gear optimizer with the huge number of special effects in the game right now. You wouldn’t even be able to tell the difference between two sets of gear 1k apart just playing the game, which is what makes it even harder. It is actually pretty much impossible to verify the results of gear optimization in-game below a pretty large threshold, like almost 3-4% or so.

prime · February 3, 2020, 3:48pm

My point is that, even though its is statistically insignificant, when we don’t put any threshold (and there are options like .25%) on gems/enchants, and after repeated simulations, it still sims higher to use other enchants (haste/crit over crit/vers) shouldn’t there be something in place to make it choose the better one? Even if it is practically impossible to verify ingame?

Swol · February 3, 2020, 4:35pm

We aren’t actually simulating the exact sets of gear that you are comparing. So we really don’t know exactly what any given set of gear would “simulate to” while we are optimizing. We have created a scoring function that very closely predicts what your theoretical (simulated) result would be - without actually doing the simulation. That is what allows us to go through ALL the combos of gear you have.

One advantage to how we pick gear is that you always get the same result, even if those results are very close to each other. This case you have here ends up with two data points so close together that you could run those simulations again and possibly run into a case where the result flips - they’re within margin of error of each other. Our optimization algorithm smooths out the inherent noise in the simulation results so that we can give consistent results that track very closely to what the simulations suggest.

We generally say that we can pick you a set of gear that is practically indistinguishable from the theoretically “best” one set of gear ultimately possible to find if you were to spend a few weeks brute-force simulating. “Practically” is the important word there - by that we mean you couldn’t actually go in-game and notice a difference, assuming you can play all gear builds equally well.

prime · February 3, 2020, 11:44pm

Oh i thought the whole point of the global network was that most stuff was being/would be simulated and then applied 2 the bib option to help find the best gear. But instead its just used to find a decent scoring function?

Swol · February 4, 2020, 2:20am

It would be prohibitive to simulate every relevant option of gear for even one player with just 3 relevant items to pick from in each slot. The only possible way to examine all possible sets of relevant gear is to use some sort of statistical analysis - that is what sets AMR apart from other types of gear optimization.

We use the global network to generate millions of data points with which to do our analysis, but that is still just a small fraction of all the relevant gear combinations.

Sienss · February 4, 2020, 8:07am

Would it improve stability of the suggested gear from the optimizer and the scoring function if AMR’s server would run the lastest saved gear for each user making a “point” into the scoring function.

If AMR isn’t running full potential without any glonet running and you dont have variable loads with your Azure contract maybe try something like this :
Target downtime for server,
target who wasn’t modifying his gear in the last 6 hours maybe, run their gear, backpropagation into the scoring function.
And if the server load is still under a certain treshold over few days/weeks. Try item they have in bag ? :x

Swol · February 5, 2020, 1:42am

Yeah, we could always go nuts and continuously add data points into the mix with the global network…

We’ve thought about stuff like that in the past. The problem is that we carefully pick the data points we run to give us what we need to create the scoring function. Throwing more or less random extra data points into the mix won’t necessarily help score that particular set of gear.