Local AI model claim to beat GPT 5.5 and Opus 4.7

The Nex team: https://github.com/nex-agi/Nex-N2, claims to have beaten GPT 5.5 and Anthropic opus 4.7, in at least one benchmark with a relatively small opensource AI model. The model is certainly worth trying, considering that it can run locally 100%.

THe model is available in GGUF, q4_k_m here: https://hugston.com/models/hugston-nex-agi-nex-n2-proq4-k-m

for some days. A better look to the benchmark under:

Benchmark	Nex-N2-mini	Nex-N2-Pro	GPT-5.5	Opus 4.7	Kimi-K2.6	GLM-5.1	MiniMax M3	DeepSeek-V4-Pro
Agent								
BrowseComp	74.1	83.7	84.4	79.8	83.2	79.3	83.5	83.4
GDPval	1402	1585	1769	1753	1481	1535	-	1554
Toolathlon	33.3	51.9	55.6	52.8	50.0	40.7	-	51.8
WildClawBench	47.7	53.5	58.2	62.2	-	48.2	-	43.7
WideSearch	62.0	75.6	-	-	80.8	-	-	-
TAU3	65.9	71.1	-	-	-	70.6	-	-
Coding & SWE								
SWE-Bench Pro	50.2	58.8	58.6	64.3	58.6	58.4	59.0	55.4
Terminal-Bench 2.1	60.7	75.3	83.4	69.7	-	58.7	66.0	72.0
DeepSWE	8.0	33.6	70	54	24	18	-	8
SWE-Bench Verified	74.4	80.8	82.9	87.6	80.2	-	80.5	80.6
SWE Atlas QnA	31.5	37.9	45.4	45.2	-	-	37.9	-
SWE Atlas RF	30.0	32.9	44.8	48.6	-	-	-	-
SWE Atlas TW	23.3	40.0	42.6	38.2	-	-	30.8	-
General & Reasoning								
GPQA Diamond	82.6	90.7	93.6	94.2	90.5	86.2	-	90.1
IFEval	89.1	94.0	-	-	94.5	94.5	-	91.9
Apex	9.4	36.5	-	-	24.0	11.5	-	38.3

This model was converted and quintized with QUANTA (made by Hugston team):

bold text

We would like users feedback on it. Is this model good as it says?

Comments