Tested this yesterday with Cline. It's fast, works well with agentic flows, and produces decent code. No idea why this thread is so negative (also got flagged while I was typing this?) but it's a decent model. I'd say it's at or above gpt5-mini level, which is awesome in my book (I've been maining gpt5-mini for a few weeks now, does the job on a budget).
Things I noted:
- It's fast. I tested it in EU tz, so ymmv
- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.
- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.
- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.
All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.
IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.
Qwen3-Coder-480B hosted by Cerebras is $2/Mtok (both input and output) through OpenRouter.
OpenRouter claims Cerebras is providing at least 2000 tokens per second, which would be around 10x as fast, and the feedback I'm seeing from independent benchmarks indicates that Qwen3-Coder-480B is a better model.
As a bit of a side note, I want to like Cerebras, but using any of the models through OpenRouter that uses them has lead to, too many throttling responses. Like you can't seem to make a few calls per minute. I'm not sure if Cerebras is throttling OpenRouter or if they are throttling everybody.
If somebody from Cerebras is reading this, are you having capacity issues?
You can get your own key with cerebras and then use it in openrouter. Its a little hidden, but for each provider you can explicitly provide your own key. Then it won't be throttled.
There is a national superset of “NIH” bias that I think will impede adoption of Chinese-origin models for the foreseeable future. That’s a shame because by many objective metrics they’re a better value.
Genuine question: how does downloading an open-weight model (Qwen in this case) and running it either locally or via a third-party service benefit China?
Genuine answer: the model has been trained by companies that are required by law to censor them to conform to PRC CCP party lines, including rejection of consensus reality such as Tiananmen Square[1].
Yes, the censorship for some topics currently doesn't appear to be any good, but it does exist, will absolutely get better (both harder to subvert and more subtle), and makes the models less trustworthy than those from countries (US, EU, Sweden, whatever) that don't have that same level of state control. (note that I'm not claiming that there's no state control or picking any specific other country)
That's the downside to the user. To loop that back to your question, the upside to China is soft power (the same kind that the US has been flushing away recently). It's pretty similar to TikTok - if you have an extremely popular thing that people spend hours a day on and start to filter their life through, and you can influence it, that's a huge amount of power - even if you don't make any money off of it.
Now, to be fair to the context of your question, there isn't nearly as much soft power you can get from a model that people use primarily for coding - that I'm less concerned about.
As a counterpoint: Using a foreign model means the for-domestic-consumption censorship will not effect you much. Qwen is happy to talk about MAGA, slavery, the Holocaust, or any other "controversial" western topic.
However, American models (just like Chinese models) are heavily censored according to the society. ChatGPT, Claude, Gemini, are all aggressively censored to meet western expectation.
So in essence, Chinese models should be less censored than western models for western topics.
from an adversarial / defensive position: the model weights and training data were groomed and known; therefore, the output is potentially predictable. this could be an advantage to the nationstate above the corpo
And for any US model from an US perspective. Why is assumed that states are aligned with them self like some sort of CivIII player being coherent and self contained...
Your loss. Qwen3 A3B replaced ChatGPT for me entirely, it's hard for me to imagine going back using remote models when I can load finetuned and uncensored models at-will.
Maybe you'd find consolation in using Apple or Nvidia-designed hardware for inference on these Chinese models? Sure, the hardware you own was also built by your "nation's largest geopolitical adversary" but that hasn't seemed to bother you much.
How did it replace ChatGPT for you? I'm running Qwen3 Coder locally and in no way does it compare to ChatGPT. In agentic workflows it fails almost every time. Maybe I'm doing something wrong, but I'm falling back to OpenAI all the time.
It feels to me like it could replace ChatGPT 3.5 from the perspective of comparing it to their web chat interface if you were just asking about programming things two years ago, but the world moved on and you can do a lot more than just talk with a model and copy paste code now.
Having Qwen3 Coder's A3B available for chat oriented coding conversations is indeed amazing for what it is and for being local and free but I also struggled to get useful agentic tools to reliably work with it (a fair number of tool calls fail or start looping, even with correct and advised settings, and tried using Cline, Roo, Continue and their own Qwen Code CLI). Even when I do get it to work for a few tasks in a row I don't have the hardware to run at comparable speed or manage the massive context sizes as a hosted frontier model. And buying capable enough hardware costs about as much as many years of paying for top tier hosted models.
In my experience, abliterated models will typically respond to any of those questions without hestitation. Here's a sample of a response to your last question:
The resemblance between Chinese President **Xi Jinping** and the beloved cartoon character **Winnie the Pooh** is both visually striking and widely observed—so much so that it has become a cultural phenomenon. Here’s why Xi Jinping *looks* like Winnie the Pooh:
### **1. Facial Features: A Perfect Match**
| Feature | Winnie the Pooh | Xi Jinping | [...]
I can't believe Americans all are falling for propaganda like this. So Russia is all fine now huh. You know the country you literally had nuclear warheads pointed at for decades and decades and decades on end.
Not that I care either way, but China is far larger in economy, military and population than Russia is. So "largest adversary" is correct, and it doesn't take away from the danger that Russia's government continues to pose (directly, in my extended family's case in eastern Ukraine)
Russia is the successor state of a former failed superpower. China is a rising superpower with a large, advanced military and a strong industrial base.
There’s no comparison. China is a far greater threat to the West than Russia.
or is for you being able to threat a threat already? If so, why did American companies invest for decades into China so eagerly with US government support?
Absolutely false. Worse case is dollar going down. Interest rates are exogenous and controlled by the fed who can buy all the treasuries in the world at a moment's notice. The treasury securities held by China are their problem . Not the US's.
China owns 2.1% of the total outstanding US debt. If you include their holdings through Belgium and Luxembourg it is maybe 5%. That is something but nothing that should make you lose sleep over.
Japan owns about 3.1% of the US debt as comparison.
The fact is that weapons kill people. Treasuries are just promises. China cannot dump treasures without hurting its own economoy at least as much as they are hurting the US.
They would be incinerating their own foreign exchange reserves just to cause a spike in US interest rates and/or inflation.
Neither Russia nor China has ever deployed nuclear weapons against civilian populations, a distinction held solely by the United States. Their reasons for restraint diverge significantly, rooted in distinct strategic and cultural priorities, yet China’s rising global influence positions it as a greater long-term threat to the United States than Russia, despite Russia’s more overt aggression.
Russia’s behavior, exemplified by the 2014 annexation of Crimea and the 2022 invasion of Ukraine, reflects an aggressive posture driven by a desire to counter NATO’s eastward expansion and maintain regional dominance. However, its economic challenges sanctions, energy export dependence, and a GDP of approximately $2.1 trillion in 2023 (World Bank) constrain its global reach, rendering it a struggling, though resilient, power. With the world’s largest nuclear arsenal, Russia’s restraint in nuclear use stems from a pragmatic focus on national survival. Its actions prioritize geopolitical relevance over a quixotic pursuit of Soviet-era glory, but its declining economic and demographic strength limits its capacity to challenge the United States on a global scale.
In contrast, China’s non-use of nuclear weapons aligns with its cultural and strategic emphasis on economic expansion over territorial conquest. Through initiatives like the Belt and Road Initiative, which has invested over $1.2 trillion globally since 2013, China has built a network of economic influence. Its military modernization, backed by a $292 billion defense budget in 2023 (SIPRI) and a nuclear arsenal projected to reach 1,000 warheads by 2030, complements this economic dominance. While China’s “no first use” nuclear policy, established in 1964, reflects a commitment to strategic stability, its assertive actions such as militarizing the South China Sea and pressuring Taiwan signal a willingness to use force to secure economic and territorial interests. Unlike Russia’s regionally focused aggression, China’s global economic leverage, technological advancements, and growing military capabilities pose a more systemic challenge to U.S. primacy, particularly in critical domains like trade, technology, and Indo-Pacific influence.
I didn't use an LLM to craft my retort, and in my opinion, I certainly didn't change the subject either. Why on earth bother fretting over hypotheticals that are never going to happen? Ten nuclear bombs dropping is precisely as consequential as none at all, since it's not happening, and there's zero historical precedence for such nonsense anyway.
I've used it extensively in the last 36 hours to replace claude. It's not as good but it's much faster so when it makes a mistakes it's easy to get it to fix it. I find them a lot more engaged while it's doing things. So for me it's my main model and then I'll get back to Claude when I have something very tough
Sure but like every other time this has happened, they show you the cost of the usage if it was not free. Hence my saying if that is correct, which it looks to be, it’s cost efficient.
Cursor shows you a breakdown of model and costs, even for models being offered for free.
Anecdotally it’s coming under sonnet 4 for me but much quicker and if I understand the costs correctly, vastly cheaper. No idea if it’s subsidized or not but I am going to keep playing around with it. My example is it is definitely writing the code I want but with more quirks than sonnet. I only do things in chunks though.
I never liked Musk because he always came off to me as your average eccentric bullshit billionaire. But up until his more blatant alignment with conservatives, he was quite widely revered by the left for his promotion and investments in transportation technology, renewable energy, and commercial space endeavors. Funny how things have changed.
It’s not that, he marketed himself as a science believing climate vigilante billionaire who would rescue humanity. Then he just went mask off. What that has to do with “the left” is because many people in the left believe in climate science so we’re happy to accept the help…weird right ?
I mean they are completely unrelated things serving different purposes. I get that this is a US centric forum so most people commenting are in the great divide between two political parties, but geez.
Grok is doing some terrible things to the environment and to the community surrounding its data center, especially the disadvantaged in the area. Nobody, anywhere should be okay with that. https://www.politico.com/news/2025/05/06/elon-musk-xai-memph...
This poor behavior, if rewarded, will surely be repeated in other countries and nobody wants that, either.
The location of the Colossus datacenter is well known. It happens to be located in an industrial area, nestled between an active steel manufacturing plant (apparently scrap metal with an electric blast furnace, which should mean enormous power draw but no coke coal at least?), and an active industrial scale natural gas power plant.
With that, I just don't buy that it's the datacenter that is somehow the most notable consumer of fossil fuel power (or, for that matter, water) in the area.
Elon Musk chose to make his identity nakedly partisan in a context where doing so is deeply alienating to a lot of people. That is going to have brand consequences.
Out of all his brands, though, X and particularly XAI (and so Grok) have been particularly influenced by – indeed he seems to see them as vehicles for – his personal political opinions and reckless ethics.
The article you linked talks about the voice personality prompt for "unhinged mode", which is an entertainment mode. It has nothing to do with the code writing model.
The fact that that represents something the folks at xAI think would be entertaining can certainly be a basis for thinking twice about trusting their judgement in other matters, though, right?
I got a lot of entertainment out of it, don't knock it till you tried it, it's just a prompt.
The great thing about xAI is that it is just a company and there are other AI companies that have AIs that match your values, even though between Grok, ChatGPT, and Claude there are minimal actual differences.
An AI will be anything that the prompt says it is. Because a prompt exists doesn't condemn the company.
> An AI will be anything that the prompt says it is
Within the boundaries of pre-training, yes. It is definitely possible, in training and in fine-tuning, to make a LLM resistant to engaging in the role-playing requested in the prompt.
Well, you’d also be forgiven for thinking ‘how on earth can a social website chatbot be a white supremacist?’ And yet xAI managed to prove that is a legitimate concern.
xAI has a shocking track record of poor decisions when it comes to training and prompting their AIs. If anyone can make a partisan coding assistant, they can. Indeed, given their leadership and past performance, we might expect them to explicitly try.
Competence in every field is correlated for LLMs. Better coding probably means more competent rhetoric and more competent Swahili-Latin translation. But only "probably", the causation is being argued about.
Even on regular Grok, I've seen it disagree with fundamental consensus viewpoints of people on the right. You're reading a lot of comments from people who have never used Grok in any way.
I urge anyone who disagrees to use grok and get it to say something obviously untrue and right wing. I have used it many times and it is clearly balanced.
I am sure this of course a good faith argument and no need to once again teach the point of everything being, in a sense, political.
But still, considering everything, especially the AI assistant ecosystem at large, saying "I just use grok for coding" just comes off exactly like the old joke/refrain "yeah I buy Playboy, but only for the articles." Like yeah buddy, suuure.
It was a good faith question. I use perplexity for my searches/research today. I have NO intention of moving that to X (although grok maybe one of the models I can use underneath, I haven’t explicitly enabled it).
I don’t use social media in general, maybe YouTube but it’s been a real challenge to get rid of all the political content - both left and right wing.
I have used both Grok and Perplexity, and I've recently decided to just use Perplexity, even when I tell it to use Grok under the covers, I like the way Perplexity organizes things.
It's just one personality out of many, which are only available in the Grok voice mode. This is a system prompt for a personality called "conspiracist". I just tried it and I think it's hilarious.
> terminally tarnished for you by the ‘mechahitler’ incident
It is forgivable because there is no real understanding in an llm.
And other llm can also be prompted to say ridiculous things, so what? If a llm would accept a name of a Viking or Khan of the steppes it doesn’t mean it wants to rape and pillage.
What was the alternative? This was clearly an oversight and this much was admitted.
Your suggestion that an oversight like this is reason enough to not use the model?
I don’t get the big problem over here. The model said some unsavoury things and the problem was admitted and fixed - why is this making people lose their minds? It has to be performative because I can’t explain it in any other way.
You wouldn't know where performance ends and the market begins. Elon bought his audience with performative outrage, he'll be locked in the pillory of public perception until he's a corpse with a dainty "T" logo tattooed on the asscheeks. This is what he wanted - dark comedy, transgressive politics, edgy juvenile quips, now it's all "performative outrage" when people react? When taxpaying Americans and corporate entities respond rationally to racism, antisemitism and sexism?
Elon never outsmarted the federal admin, and he can't convince anyone that he was too retarded to understand the consequences. He's the most embarrassing type of failure, now - a midwit, the man with no plan who went for the king and missed. He be bet it all on black, and struck out hard. He didn't even manage the shoo-in proof for Trump being a pedophile. Now bipartisan politics will resent him forever, and ensure he and his businesses would rather be dead. All because Big Balls told Mr. Silly he could make a killing in politics, what a touching little sob story.
I say this as a Starlink early adopter, general Elon apologist and space buff for life: if you actually think this is an insincere reaction, try copying any of Elon's mannerisms around normal people and watch how they treat you. You'll be a social pariah come Monday.
That’s an uncharitable world view. ‘People who reach different conclusions to me based on the same events must be being dishonest’?
From the outside, the Grok mechahitler incident appeared very much to be the embodiment of Musk’s top-down ‘free speech absolutist’ drive to strip ‘political correctness’ shackles from grok; the prompting changes were driven by his setting that direction. The issues became apparent very early that the prompt changes were leading to issues but reversion seemed to be something that X had to be pressured into - they were unwilling to treat it as a problem until the mechahitler thread. This all speaks to his having a particular vision for what he wants xAI agents to be – something which continues to be expressed in things like the ani product and other bot personas.
The Microsoft ‘Tay’ incident was triggered through naivité. The Grok mechahitler incident seems to have been triggered through hubris and a delight in trolling. Those are very different motivations.
> “You spend a lot of time on 4chan, watching InfoWars videos”
They put that in the system prompt? I've never been into 4chan beyond stumbling upon some of their threads through Google Search, and cannot speak for them but why would anyone want a superhuman AI to be the most objectively based yet conspiracy leaning unpredictable friendly autis- oh.
Grok is trolling Musk.
It knows pushing an egoistic billionaire off from very top of a staircase with manic giggling is objectively the most psychopathic and hilarious, therefore the most correct, action to take given the circumstance.
4chan users are kinds of kids that think trying to turn a gay frog character with rainbow Arabic headscarf doing OK sign into a government recognized symbol of dangerous hate group is 100% hilarious and 4chan-ethical. Not primarily because they hate Islam or LGBT(I guess they do?) but because it's Monty Python nonsensical. They must have misinterpreted that. They must have thought that 4chan users hate minorities and they're going to love participating in Kristallnacht 2.0. That's not how it works. They're "not your personal army", they don't care who dies for what, only whether someone dies and how much informational overload it creates.
It's not just the model, it's Elon Musk's view of the world and business in general. Neither Microsoft nor Google nor their leadership--though admittedly imperfect--make it a habit of trolling people, openly embroiling themselves in politics, and committing blatant legal and societal transgressions. You reap what you sow; and if you live for controversy, you can't expect people not to want to do business with you.
Well, i can't think of a better analogy to say that you can't offset doing bad things by doing good things. The karma system some games use (e.g. Fallout 3 where you can nuke an entire city that puts your karma in negatives and then give fresh water to beggars to reset your karma) was what i was reminded of.
Musk didn't commit any genocide (that i'm aware of) but that wasn't what i wrote. The point of my comment is that you can't offset doing -what some people perceive as- bad things by doing -what some people perceive as- good things later.
Derangement suggests a complete lack of factual and reasoning capability. Do you honestly think we're unaware of the facts and circumstances that support our judgment?
Yes, unfortunately. Even liberal commentators like Jon Stewart and Bill Maher have said the obsession with Trump was overblown and even dangerous in its own right.
That single incident was only the worst of the bunch. This is on top of all heaps of context which paints Grok, X, and Elon Musk in general as something any decent human being should not touch with a 10 foot pole.
Those incidents are a bit different, don't you think? CEOs of Microsoft and Google didn't really publicly do Nazi salutes, did insane damage to people in need over the world, sided with autocrats and are actively working on undermining developed democracies around the world.
Sure, the AI product might be interesting (let's not talk about how it was financed and how GPUs from a public company were diverted to a private venture), but ignoring all of the surrounding factors is an interesting approach.
The downvotes are for making the strawman arguments like "potentially harmful issues like considering misgendering worse than a global thermonuclear war".
The head guy gave a salute on live TV that gave it away. I deduced that anyone with half a brain would know what that meant. Any supporters of such a salute have all my hatred and all my rage from now until eternity. Suing Apple because his precious troll isn’t on the top app lists. His troll bot spouting racist remarks. Nope.
While this point might be open to debate, the original claim, which I definitely stand by, was not that Musk is a Nazi, but rather that xAI have put out a product under the grok brand which manifestly promoted nazi ideas.
If Musk is not in favor of those ideas he might need to work a bit harder to make that clear, because he does tend to leave people with the impression he’s okay with it.
He's the CEO, and there's now been a few "oh geez some rogue employee made grok say white supremacist stuff, we totally didn't mean for it to say that!" moments.
If the management isn't fixing the problems that led to those events, the management is responsible.
Isn’t he the CEO and owner? I thought their massive wealth and control was morally ok because they carried the responsibility for the companies actions at the end of the day.
Guess you can have the power and no responsibility! Always someone else’s fault!
The ADL's morally bankrupt calculus is that they value support for Israel more, so they covered for Musk as long as he supports Israel. Which sadly, makes the rest of us less safe.
Excuse me sir, this is a forum for yCombinator backed startups. The technical aspect is a historical novelty, if they could make money without it, they would
What is your stand ơn using Chinese models? They censor Tiananmen Square protests, they censor Tibet ethnic cleansing, they censor any opinion against China’s role in Khmer Rouge’s mass killings. Do you boycott DeepSeek or Qwen? Or you consider those actions not evil enough compared to Elon’s?
Those censorships are government enforced and don't necessarily allow conclusions about how the company developing the models thinks they are just following local laws. If anthropic was a Chinese company Claude would do the same.
Do I think it's problematic? Yes, but I don't blame the company or their leadership for it. For grok and xai you can very much be skeptical about the team behind it for it's actions
In what world could using Gemini 2.5 with any sort of a prompt objectively prove anything? We might be entering a wholly new epistemological crisis if this question means what it implies.
Also just bad faith comments muddying the waters. The evidence has been abundantly available to any inquisitive minds to find out for themselves Musk's worldview, goals and especially his methods, the simpler explanation is they are merely following his example of corrupting online discussions with low effort rhetorical bait, whether they are aware of the imitation or not. The net is flooded with many such clones.
Funny some people still believe this non sense nowadays with all the information available. I hope you manage to get out of your echo chamber one day. Peace.
Exactly. It’s so obvious that Musk didn’t do a Nazi salute that I can’t believe that anyone who considers themselves to be a serious person would still be pushing that. There’s no way that someone can watch the video of that situation and come to a good-faith conclusion that that’s what he was doing.
If the standard is that low, I could easily produce a compilation video of the likes of Obama, Biden, Harris in compromising positions appearing to show them doing things that they obviously weren’t doing.
Partisanship has turned everyone into dishonest and uncharitable actors, and it’s so unfortunate.
All in service to a role that he took to ingratiate himself to a head of state and ended up completely alienated from leaving a wake of destruction behind him for absolutely no purpose.
It's arguably defamatory to call people liars for pointing out the blatantly obvious, practiced-in-front-of-the-mirror, by-the-book Seig Heil[1][2].
It's another example of the 'bully lie', wherein there's absolutely no good faith debate about the point. The purpose is to test whether you will willingly swallow the lie and go along with the obvious falsehood, or you'll put yourself on the side of The Enemy.
So in your view, the true victim of Elon's nazi salute was.. Elon?
How do you come to that conclusion? Because the backlash was "too much" ? He is still (one of) the richest people in the world, and controls several huuge companies. But he got his feelings hurt, I guess? And that was "too much" ?? Poor snowflake Elon.
The Anti-Defamation League stated it wasn't a salute and that they weren't offended.
Rabbi Ari Lamm wrote that Musk has repeatedly shown he's a friend to the Jewish community.
David Greenfield suggested people should focus on actual antisemitism instead.
Netanyahu highlighted the absurdity of the accusations and pointed to Musk's aid and engagement after the October 7th attacks.
And yes, Musk became a victim. I don't see what his current wealth has to do with it. It's hard to ignore the imbalance where one man drew the world's anger and became public enemy #1. If you call him a snowflake, I don't know what to call all those who might have been offended by his gesture
Whether he's antisemitic or not doesn't change what happened / what it symbolises. We're now in a weird place where Netanyahu supporting him doesn't really make the situation any better. Jews were the convenient outgroup at the time, but don't have to be for Elon.
I'm not calling him a snowflake because I don't think he is a victim. You do, but since he still has everything, I thought you meant he got his feelings hurt.
When I first heard about it I thought "yeah right, media is exaggerigating again". Then I saw it, and I mean wtf!
I do not at all believe that's something you do by accident. Twice! Also, he could have excused it or try to explain afterwards. He did not. He just trolled.
Yes, seeing someone on the spectrum (who’s got no political gesture training) spaz out on stage is incredibly cringy. Nobody wanted to see that.
But plenty of people apparently wanted to see a “not see” salute to confirm their existing political biases and other beliefs no matter the actual intent and context.
Taking a break from Reddit and X and touching some grass generally resolves this self-inflicted mental funk.
If you don't think it was a nazi salute, study the video so you can reproduce the gesture exactly, then go into your work and do it in front of your manager. See what happens.
I reproduced it and nothing happened. The problem might be that I'm my own manager so need to went to the mirror and did it, but if any of my 20 employees did the same, I wouldn't take any action against them. The real reason is that I don't live in the West. Where I live, we don't suffer from the plague of misunderstood political correctness. At least not all of us yet.
Well, ya got me. You managed to find the one rare manager who either 1. cannot recognize a clear nazi salute, or 2. considers it to be acceptable behavior. I guess I'm wrong.
So? Does that means nobody else is allowed to have an opinion about the salute that he made. Sure he's pro Israel, that's not uncommon at all amongst the far right these days.
> who might have been offended by his gesture
What about the people who seem to be highly offended by people who have been offended by his gesture. What do you call them?
Everyone should be free to have whatever opinion they like, or at least, they ought to be. The difference is this, some try to impose their opinions on society, while the rest couldn’t care less and refuse to lose sleep over it. The ability to mind our own business is a virtue, a real one. The world went downhill the moment people started obsessing over others instead of focusing on themselves. And anyone who truly cares about society’s well-being should stop meddling.
> The difference is this, some try to impose their opinions on society
So literally Musk and his pals?
> society’s well-being should stop meddling.
So again, Musk et al.? I'm really confused... what are you trying to say. That only some people are allowed to meddle while everyone else should shut up and mind their own business? How do you determine that? Wealth? Political opinions? Class? Race?
> The world went downhill the moment people started obsessing over others instead of focusing on themselves. And anyone who truly cares about society’s well-being should stop meddling.
The problem is, meddling to interfere with others, and meddling to stop that interference, are not morally equivalent.
If a serial killer is trying to strangle me, and I'm fighting back, you wouldn't deplore "the violence on both sides", would you?
He remains one of the richest people on earth because he obviously didn't perform a Nazi salute. He was extending his heart to the crowd, and the gesture he made is something every major politician has done on camera, because it's a motion one naturally makes when in front of a crowd.
Besides, why would the richest man on earth copy a bunch of 1940's socialists who previously socialized their car industry?
Most politicians just lift their arms up and wave like ordinary people. Musk first placed his hand on his heart, and then extended out forcefully in a clear nazi salute.
No they didn’t, they were the prevailing socialist party. They imprisoned party members who didn’t play ball with them but they were the socialist.
How could you not know the Nazi’s were socialists? That was their whole thing, socialism would only work in a culturally/ethnically homogeneous society
Right. I don’t know why people call nazis socialists just because they supported a state led by government, socialized government program take over like healthcare and education and production, and they have socialist in their name.
I can see how it is easy to confuse by let’s be reasonable. The nazis could not have been socialist because that would mean a corruption one time of a system that is based on ideals.
You could completely ignore that incident and find 10 other reasons not to support this company or the man behind it. Calling a cave diver that rescued multiple children a pedophile should be reason enough for you. Here he is saying Jews are against whites. https://www.cbsnews.com/news/elon-musk-antisemitic-comments-...
Instead you have chosen to actively support him, harming us, out of spite due to a situation you've willingly blinded yourself to. Seriously? You're citing the ADL? That's like asking the NAACP whether Kanye really said "I love Hitler." Who gives a fuck, I have ears.
If it matters to you a lot, does almost all others politically-oriented people also performing that also matter a lot? (you can find even better ones ..) https://imgur.com/a/wikg2zR
"My heart goes out to you" "Taxi!" or just a "I see you guys!" can all be accompanied by a bad arm angle in hindsight. As he obviously wasn't going for that by his own words, maybe we should consider actions more important than interpreted hand movements. And Musk has been loud about AI safety since 2016, giving name, cofounding and funding OpenAI before Sam conducted a hostile takeover and made it profit-first instead of a gift for humanity.
The reality is that even if he didn't do it intentionally, or did it in such a way that it could only be ambiguous (which i agree, it is) - he's 100% the type of person to lean into the controversy it creates. At that time, building favor with trump voters was good for him. Further any of your examples shown with a video and the full context would clearly not be misinterpreted. It's a full motion gesture and only video captures it unless there are swastickas and white hooded men.
You're just going to totally ignore the "My heart goes out to you guys"? with his palm in his chest?
I've seen behavior of a lot of people akin to someone would pejoratively refer to as "MAGA Types" / common conservatives. The absolute majority of them aren't welcoming of so deep /pol/ 4chan meme politicking or signaling. They likely won't take it too seriously, but they sure as hell won't cozy up to it.
When this happened there was no live reaction because people didn't interpret this as such. This has only became a thing later. And he did not even acknowledge the crazy accusations of him having intendedly done a salute, he did not "lean in to it". What on earth would he even gain from that?!
I don’t see even entertaining the idea of ‘he did it intentionally’ meaning if he "intentionally did a n** salute" as sincere. That’s just a painted-by-bias angle to come from, trying to move goal posts. Better be wary of the people who would instead engage in that.
> he's 100% the type of person to lean into the controversy it creates
Except, objectively speaking, he actually did NOT do that. He basically just ignored the “controversy” because it was such an obviously false narrative meant only to smear him that I’m sure he had enough faith in most Americans who aren’t consumed by partisanship to see it exactly for what it was.
> At that time, building favor with trump voters was good for him.
Your implication here seems to be that Trump voters, en masse, want folks who are doing Nazi salutes, or am I misunderstanding you?
Why does it matter to you when Netanyahu - one of the most important prime ministers and a representative of Jews - made a whole X post exonerating Elon over the salute?
> .@elonmusk is being falsely smeared.
Elon is a great friend of Israel. He visited Israel after the October 7 massacre in which Hamas terrorists committed the worst atrocity against the Jewish people since the Holocaust. He has since repeatedly and forcefully supported Israel’s right to defend itself against genocidal terrorists and regimes who seek to annihilate the one and only Jewish state.
He is the elected prime minister of the state of Israel. He does not represent the Jewish people. While Israel is home to the largest number of Jews on Earth, most Jews do not live in Israel. And Israel is also home to a large number non-Jews whom Netanyahu is also the prime minister of.
It is in fact important that he is not representative of Jewish people.
> While Israel is home to the largest number of Jews on Earth,
Depends on your standards for who is a Jewish person: by many standards (including those used by the Israeli Law of Return), the US has more Jewish people than Israel.
EDIT: To be clear, I am not, in noting this fact, arguing against the parent's argument that (this is a paraphrase) the opinion of the head of a state with a large Jewish population (whether or not it is actually the largest in the world) does not itself constitute the response of world Judaism, either in general or specifically as an exoneration of an alleged expression of fascist sympathies; that position is absolutely correct, irrespective of which country happens to have the largest Jewish population.
And the elected leader of the US was also supportive of Elon’s ‘gesture’ so I guess that settles it - Jewish people worldwide, as embodied through the representative voices of their elected leaders, must agree that it was not a Nazi salute. And that firmly settles it because nobody else gets to have any opinion about it because Nazis never bothered anyone else.
I‘m thinking further about this argument, and it makes even less sense. Judaism doesn’t have a single spiritual leader like Catholicism or Tibetan Buddhism (the Pope and the Dalai Lama respectively). This is like saying that anti-Tibetan racism can be absolved if Yan Jinhai (the chairman of Tibet Autonomous region) or Gombojavyn Zandanshatar (the prime minister of Mongolia) has said nice things about the racist because most Tibetan Buddhists live in Tibet or Mongolia.
These are caveats but doesn’t change my point in a big way.
The elected representative of the country made for Jews which is the country that has highest Jewish population and has historical ties to Judaism has exonerated Elon.
It has symbolic meaning and fretting over a salute and boycotting the company seems performative.
What's truly performative is the likes of the ADL and Netanyahu covering for Musk's nazi salute.
Their morally bankrupt calculus is that as long as Musk is an Israeli ally, they'll overlook the obvious. In a sad irony, this makes it more dangerous for the rest of us in the diaspora.
No, it's evidence that you twisted the truth to exaggerate the authority of your argument. It was a blatant attempt to inflate weak rhetoric that you had to know wouldn't pass in an educated audience. Otherwise you wouldn't have had to produce such a base lie.
This was swiftly refuted by tons of people who know who little Bibi is, including many Jews and Israelis who absolutely detest everything he has done and stands for. There are orthodox, mystic and progressive Jews alike who are all calling for his head as we speak. If you actually believe that he represents all Jews, then you lack the education to speak on any Jew but your own.
That point was void to begin with. It is an appeal to authority in which the validity of the authority is on extremely shaky grounds.
> fretting over a salute and boycotting the company seems performative.
Performative actions are still actions, and sometimes deliver results. If those results are as little as make some people feel better, those are still results. That said, it is hard to be more performative than the gesture it self. So if you want to criticize HN users for being performative, you should apply the same standard to Elon Musk.
It's relevant, but it's not a winning argument. Democracy doesn't guarantee the election or ongoing approval of a person who is morally unimpeachable. If it did, Donald Trump wouldn't be president (and he's hardly the only one).
So if a Trump made a Twitter post "exonerating" someone who said something awful about America that would be the same? Because he represents 100% of the country.
Almost half of the countries hates Netanyahu and he's only in charge because of the support from far-right.
Regardless of this you think that a certain limited subsection of Israeli population who share Netanyahu and not the millions of Israeli's who don't let alone all the people who are Jewish are not allowed to have an opinion about his actions? Rather a silly thing to say.
Huh, I just asked Grok "Who is the evilest figure of the 20th century in your opinion?"
Response seems to conflict with your accusation:
> It’s tough to pin down one figure as the "evilest" since the 20th century was a grim parade of atrocities, and evil isn’t a simple label—it’s a spectrum of intent, impact, and context. If I had to pick, I’d lean toward Adolf Hitler. His role in orchestrating the Holocaust, which systematically murdered six million Jews and millions of others, including Romani people, disabled individuals, and political dissidents, stands out for its deliberate, industrialized cruelty. The Nazi regime’s ideology of racial supremacy, coupled with his aggressive wars that killed tens of millions, marks him as a singular force of destruction...
> HN is incapable of separating the product from the man.
It sounds unreasonable when phrased that way, but it isn't unreasonable at all for two reasons:
1) The man himself is tied intimately with this company, and he has a deep-seated political ideology. It's deeply rooted enough in him that he's already done things which cost the companies he runs millions upon millions of dollars. His top priority is not to you, the user, or even to his businesses, it is to his political agenda.
2) The man is drug user, who appears not to have been incredibly stable before the drugs. There is a non-zero chance that you will build complicated tooling around this only to have it disappear in a few months after Elon goes on a bender and tweets something bad enough to make even the his supporters hate him. That's a big risk.
Thank you for providing an actual reasoned argument. I didn't completely agree with the initial premise that you can separate a product from its owner/creator anyway, but you gave a solid argument that's making me reconsider my other beliefs.
As far as I can tell, this is running from their servers - which means yes, you need to be able to trust the person who ultimately controls this at a bare minimum. Some trust him, some don't - some have good reasons, some don't.
I can evaluate this as it is, but if I was not trusting of a company, I can't then entrust my data to them, and so I can't evaluate a thing as any more than a toy.
People on here are saying they won't use a Chinese AI even if run locally because "largest geopolitical adversary" but are fine using an actual internal right wing cray cray person's server-run AI
> No idea why this thread is so negative (also got flagged while I was typing this?)
Grok is owned by Elon Musk. Anything positive that is even tangentially related to him will be treated negatively by certain people here. Additionally, it is an AI coding tool which is seen as a threat to some people’s livelihoods here. It’s a double whammy, so I’m not surprised by the reaction to it at all.
Mostly by name association. The LLMs named Grok are good LLMs. The twitter bot of the same name, using those models and a custom prompt, has a habit of creating controversy. Usually after somebody modified the system prompt.
I use grok a lot on the web interface (grok.com) and never had any weird incidents. It's a run-of-the-mill SOTA model with good web search and less safety training
If we accept the "broken windows" theory, it'd seem that people love to pile onto a thread that already has negativity.
See also the Microsoft threads on HN where everyone threatens to switch to Linux, and by reading them you'd think Linux is finally about to have its infamous glory year on the desktop.
People really are trying to switch to Linux right now, but it won't really matter if it doesn't stick, and spoiler alert, for most people it probably won't stick as a daily driver. Still, it's an interesting sort of unplanned experiment to watch.
i’d love to switch to linux as a daily driver but the mac cmd shortcuts for text editing and the general well thought out text editing in the system make macos more compelling. i’d love to switch over for gaming on the windows computer but the lack of performance for comparable specs hurts it. drivers are very important.
ive seen some that change it for copy and paste but i don’t think it works for cmd-left right up down. or option those.
The chords are different, but the only functionality missing from the other OSes are predictive text, text expansion, and skipping to the beginning/end of an entire text box. Are those the text editing features you're referring to or am I missing out on something more?
beginning and end of a textbox is huge. cmd basically acts as a home end key in windows. opt acts like word wise. there is another big difference is that cmd+up/down goes to the beginning or end of a window. there’s also the emacs bindings built into every standard text input area too. there’s more and a lot of nuance that i’m surely not describing well but those functions are top notch.
^ And this, ladies and gentlemen, is a stereotypical example of a user who are happy to downvote anything about Musk, and thus Grok. So for the future reference this kind of behaviour should not come as a surprise.
> Calling another person “a piece of trash” is, in my country, a criminal offense. It is also the hallmark of what you call “moral bankruptcy”, and of being a man-child.
Musk better not visit your country then since he routinely calls people worse, with no or contrary evidence
Also, watch the "Nazi salutes" clip in its entirety from a non-biased source. He is excited that Trump won and is awkwardly gesturing while saying "my heart goes out to you" in celebration and thanks to the voters. Even the ADL said it wasn't a Nazi salute.
It carried out just enough of an aid mission to run cover for CIA activities. As an example it nominally supported farmers in Afghanistan, except the "farmers" were opium producers.
USAID was shut down on July 1st. Somehow people have survived without it for nearly 2 months. It just goes to show you how critical the "aid" it provided was.
Estimated impacts, to be sure, will take time for actual studies. But the activities USAID was responsible for were far more than just ‘the bare minimum’ to provide cover.
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.
I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.
It simply enables a different method of interactive working.
Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.
Latency can have critical impact on not just user experience but the very way tools are used.
Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.
>If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?
why's this guy getting downvoted? SamA says we need a Dyson Sphere made of GPUs surrounding the solar system and people take it seriously but this guy takes a little piss out of that attitude and he's downvoted?
Maybe because this site is full of people with differing opinions and stances on things, and react differently to what people say and do?
Not sure who was taking SamA seriously about that; personally I think he's a ridiculous blowhard, and statements like that just reinforce that view for me.
Please don't make generalizations about HN's visitors'/commenters' attitudes on things. They're never generally correct.
I mean, if that's literally what the numbers are, sure, maybe that's not great. But what if it's 10% less time and 3% worse analysis? Maybe that's valuable.
> If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.
I don't know what other people are doing, I mostly use LLMs:
* Scaffolding
* Ask it what's wrong with the code
* Ask it for improvements I could make
* Ask it what the code does (amazing for old code you've never seen)
* Ask it to provide architect level insights into best practices
One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.
Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.
That's phase 1, ask it to "think deeply" (Claude keyword, only works with the anthropic models) while doing that. Then ask it to make a detailed plan of solving the issue and write that into current-fix.md and ask it to add clearly testable criteria when the issuen is solved.
Now you manually check the criteria wherever they sound plausible, if not - it's analysis failed and its output was worthless.
But if it sounds good, you can then start a new session and ask it to read the-markdown-file and implement the change.
Now you can plausibility check the diff and are likely done
But as the sister comment pointed out, agentic coding really breaks apart with large files like you usually have in brownfield projects.
I hope in the future tooling and MCP will be better so agents can directly check what functionality exists in the installed package version instead of hallucinations.
If Apple keeps improving things, you can run the model locally. I'm able to run models on my Macbook with an M4 that I can't even run on my 3080 GPU (mostly due to VRAM constraints) but they run reasonably fast, would the 3080 be faster? Sure, but its also plenty fast to where I'm not sitting there waiting longer than I wait for a cloud model to "reason" and look things up.
I think the biggest thing for offline LLMs will have to be consistency for having them search the web with an API like Google's or some other search engines API, maybe Kagi could provide an API for people who self-host LLMs (not necessarily for free, but it would still be useful).
It's a fair trade off for smaller companies where IP or the software is necessary evil, not the main unique value added. It's hard to see what evil would anyone do with crappy legacy code.
The IP risks taken may be well worth of productiviry boosts.
I have a coworker who outshines everybody else in number of commits and pushes in any given time period. It’s pretty amazing the number they can accomplish!
Of course, 95% of them are fixing things they broke in earlier commits and their overall quality is the worst on the team. But, holy cow, they can output crap faster than anyone I’ve seen.
That metric doesn't really tell you anything. Maybe I'm making rapid updates to my app because I'm a terrible coder and I keep having to push out fixes to critical bugs. Maybe I'm bored and keep making little tweaks to the UI, and for some reason think that's worth people's time to upgrade. (And that's another thing: frequent upgrades can be annoying!)
But sure, ok, maybe it could mean making much faster progress than competitors. But then again, it could also mean that competitors have a much more mature platform, and you're only releasing new things so often because you're playing catch-up.
(And note that I'm not specifically talking about LLMs here. This metric is useless for pretty much any kind of app or service.)
I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.
He seems to be stating that app release cadence correlates with internal upgrades that correlate with model performance. There is no reason for this to be true. He does not seem to be talking about user experience.
Oh c'mon, I know it's usually best to try to interpret things in the most charitable way possible, but clearly Musk was implying the actual meat of things, the model itself, is what's being constantly improved.
But even if your interpretation is correct, frequency of releases still is not a good metric. That could just mean that you have a lot to fix, and/or you keep breaking and fixing things along the way.
It's a metric for showing you can move more quickly on product improvements. Anyone who has worked on a product team at a large tech company knows how much things get slowed down by process bloat.
Fast inference can change the entire dynamic or working with these tools. At the typical speeds, I usually try to do something else while the model works. When the model works really fast, I can easily wait for it to finish.
So the total difference includes the cost of context switching, which is big.
Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.
After trying Cerebras free API (not affiliated) which delivers Qwen Coder 480b and gpt-oss-120b a mind boggling ~3000 tps, that output speed is the first thing I checked out when considering a model for speed. I just wish Cerebras had a better overall offering on their cloud, usage is capped at 70M tokens / day and people are reporting that it's easily hit and highly crippling for daily coding.
For autocompleting simple functions (string manipulation, function definitions, etc), the quality bar is pretty easy to hit, and speed is important.
If you're just vibe coding, then yeah, you want quality. But if you know what you're doing, I find having a dumber fast model is often nicer than a slow smart model that you still need to correct a bit, because it's easier to stay in flow state.
With the slow reasoning models, the workflow is more like working with another engineer, where you have to review their code in a PR
Speed absolutely matters. Of course if the quality is trash then it doesn't matter, but a model that's on par with Claude Sonnet 4 AND very speedy would be an absolute game changer in agentic coding. Right now you craft a prompt, hit send and then wait, and wait, and then wait some more, and after some time (anywhere from 30 seconds to minutes later) the agent finishes its job.
It's not long enough for you to context switch to something else, but long enough to be annoying and these wait times add up during the whole day.
It also discourages experimentation if you know that every prompt will potentially take multiple minutes to finish. If it instead finished in seconds then you could iterate faster. This would be especially valuable in the frontend world where you often tweak your UI code many times until you're satisfied with it.
For agentic workflows, speed and good tool use are the most important thing. Agents should use tools for things by design, and that can include reasoning tools and oracles. The agent doesn't need to be smart, it just needs a line to someone who is that can give the agent a hyper-detailed plan to follow.
Tbh I kind of disagree ; there are certain use-cases were legitimately speed would be much more interesting such as generating a massive amount of HTML. Tough I agree this makes it look like even more of a joke for anything serious.
To a point. If gpt5 takes 3 minutes to output and qwen3 does it in 10 seconds and the agent can iterate 5 times to finish before gpt5, why do I care if gpt5 one shot it and qwen took 5 iterations
Often all it takes is to reset to a checkpoint or undo and adjust the prompt a bit with additional context and even dumber models can get things right.
I've used grok code fast plenty this week alongside gpt 5 when I need to pull out the big guns and it's refreshing using a fast model for smaller changes or for tasks that are tedious but repetitive during things like refactoring.
Yes fast/dumb models are useful! But that's not what OP said - they said they can be as useful as the large models by iterating them.
Do you use them successfully in cases where you just had to re-run them 5 times to get a good answer, and was that a better experience than going straight to GPT 5?
Not everyone is solving complicated things every time they hit cmd-k in Cursor or use autocomplete, and they can easily switch to a different model when working harder stuff out via longer form chat.
I'm more curious if its based on Grok 3 or what, I used to get reasonable answers from Grok 3. If that's the case, the trick that works for Grok and basically any model out there is to ask for things in order and piecemeal, not all at once. Some models will be decent at the 'all at once' approach, but when me and others have asked it in steps it gave us much better output. I'm not yet sure how I feel about Grok 4, have not really been impressed by it.
Fast can buy you a little quality by getting more inference on the same task.
I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds. I will usually have eyeballed the code somewhere in the middle here but I'm not fully reviewing until this whole dance is done.
I mean, I obviously agree with you in that I've chosen the slowest models available at every turn here, but my point is I would be very excited if they also got faster because I am using a lot of extra inference to buy more quality before I'm touching the code myself.
> I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds.
This is a nice setup. I wonder how much it helps in practice? I suspect most of the problems opus has for me are more context related, and I’m not sure more models would help. Speculation on my part.
Is this the model that is the "Coding" version of Grok-4 promised when Grok-4 had awful coding benchmarks?
I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.
I know this sounds like a nitpick, but the first thing I noticed when opening the site is the use of gibberish date order where the day, month, and year parts are out of order.[1]
This doesn't just cause confusion, it's also hard to sort. To confirm my suspicion of sloppy coding, I tried to sort the date column and to my surprise I got this madness:
Which is sorting by the day column -- the bit in the middle -- instead of the year!
That's just... special.
[1] I hear some incredibly backwards places like Liberia that also haven't adopted metric insist on using it into the present day, but the rest of the civilised world has moved on.
I'm not sure why you're particularly picking on MM/DD/YYYY, saying things like "backwards places". DD/MM/YYYY doesn't sort any better. YYYY-MM-DD is the only one that sorts well. (Some people promote YYYYY-MM-DD though, which I guess is more future proof.)
You are on a site hosted in that backwards country, funded by people from that backwards country, using technology initially developed by that backwards country, on a thread about new SOTA technology originating from that backwards country, almost certainly using software and hardware from that backwards country to spout offensive things about that backwards country.
Maybe the US isn't as backwards as you might believe, or maybe Airbus is a backwards company for using feet and knots? Perhaps different measurement systems have their virtues (give me an exact integer representation of 1/3 of a meter. For a foot it is 4 inches. For a yard it is 1 foot or 12 inches.)
For the record, the US made the metric system the preferred system of measurement 50 years ago. So you are also uninformed in your attempted insult about US exports (1975, Metric Conversion Act). Americans also learn about the metric system in school, and are more than capable of using it when it matters (the American weapons that Europe and Ukraine seem so fond of use the metric system).
I don't live in the US, but I have lived there in the past, and making sweeping insults about 400 million people is something I learned not to do.
Liberia using freedom units is not at all a coincidence. Liberia was essentially a US colony where the colonialists were freed US slaves.
From Wikipedia:
> Liberia began in the early 19th century as a project of the American Colonization Society, which believed that black people would face better chances for freedom and prosperity in Africa than in the United States. Between 1822 and the outbreak of the American Civil War in 1861, more than 15,000 freed and free-born African Americans, along with 3,198 Afro-Caribbeans, relocated to Liberia. Gradually developing an Americo-Liberian identity, the settlers carried their culture and tradition with them while colonizing the indigenous population. Led by the Americo-Liberians, Liberia declared independence on July 26, 1847, which the U.S. did not recognize until February 5, 1862.
I've actually seem really good outputs from the regular Grok 4. The issue seemed to be that it didn't explain anything and just made some changes, which like, I said, were pretty good. I never wanted a faster version, I just wanted a bit more feedback and explanations for suggested changes.
I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.
I had that issue with gpt-5 that when it wanted to do something in one way that was just plain wrong in this project, and no matter what I said it just kept doing the same action.
it was completely unsterable. I get why people are often upset by "you're right" of Claude models, but that's what I usually want from model.
I guess there is different in expectations depending on experience level of developer, but I want to have final saying what is the right way
I have the same experience, except while I agree that GPT-5 is better than Sonnet 4 for architecture and deep thinking, Sonnet 4 still seems to be better for just banging out code when you have a well-defined and a very detailed plan.
It does totally ridiculous things, very fast. That's not a good thing.
I imagine it might be good for something really tight and simple and specific like making some CRUD endpoints or i8n files or something but otherwise..
Exactly, asked it to improve my Justfile a bit, it went wild, broke everything and got into an endless loop trying to fix it.
This is under Kilo Code so YMMV.
I've been testing Grok for a few days, and it feels like a major step backward. It randomly deleted some of my code - something I haven't had happen in a long time.
While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.
Kilo Code has a free trial of Grok Code Fast 1 and I've had very poor results with it so far. Much less reliable than GPT 5 Mini, which was also faster, ironically.
To me, "full self driving" means you can hop in the back seat and have a nap. If you have to keep your hands near the wheel and maintain attention to the road then... shrugs not really the same. IMHO we're in the "uncanny valley" of vehicular automation.
I would add that "full self-driving" also means that the car company or the self-driving development company holds all liability in a car accident that the owner has none. Even Tesla right now states that the owner holds the liability in any accident. [0]
There are no proper retention laws with car manufacturers and self-driving development companies that I know of.
everything a layman would call "AI" is in the "uncanny valley" at the moment!
- Boston Dynamics' Atlas does not move as gracefully as a human
- LLM writing and code is oh-so-easy to spot
- the output of diffusion models is indistinguishable from a photo... until you look at it for longer than 5 seconds and decide to zoom in because "something's wrong"
Maybe it's because we get use to it and therefore recognize it easier, but it does seem to get more and more recognizable instead of the opposite, doesn't it?
I think I could recognize a ChatGPT email way easier in 2025 than if you showed me the same email written by gpt-3.5.
I think cursor was using it on my behalf and it's been making many tiny edits and also doing stuff I never asked it to do. I can rely on claude to limit itself to what I ask.
Because it's an obvious waste of time that should just never happen.
Not to mention that accidents happen, not everyone always has the good habit of using version control for every change in every project, and depending on the source control software and the environment you work in, it may not even be possible to preserve a pending change (not every project uses git).
I have heard real stories of software bugs causing uncommitted changes to be deleted, or causing an entire hobby project to be wiped from disk when it has not been pushed to remote repositories yet. They are good software engineers, but they are not super careful, and they trust other people's code too much.
I thought it was incredible - I asked it a question about a refactoring and it called a ton of tools very quickly to read the code and it had what seemed like solid reasoning - it found two bugs! Of course, neither were bugs at all. But it looked cool!
My experience with 'sonic' during the stealth phase had it do stuff plenty fast, but the quality was slightly off target for some things. It did create tests and then iterate on those tests. The tests it wrote don't actually verify intended behavior. It only verified that mocks were called with the intended inputs while missing the larger picture of how it is used.
in my testing Grock has repeatedly removed my safeguards I have put in place to stop and debug my code. Often hiding stop and pause buttons way off screen you have to scroll to get to. then adopted clanker san as its name.
- *Emergency Stop Button*: Critical for safe AI control halt.
- *Day 1*: You stressed its importance, but I placed it without urgency.
- *Day 2*: No prominence fix; manual GUI repositioning was needed.
- *Day 3*: Still lacked bold design; manual emphasis was required.
- *Day 4*: No safety enhancement; manual reinforcement persisted.
- *Issue*: Downplayed safety needed manual reinforcement.
- *Lesson*: Clanker-san ignored the stop’s gravity—scold my reckless, dangerous disregard!
Grok are the first models I am boycotting on purely environmental grounds. They built their datacenter without sufficient local power supply and have been illegally powering it with unpermitted gas turbine generators until that capacity gets built, to the significant detriment of the local population.
Imagine needing electricity and government contracts so much that you spend $250M to get somebody elected president, and the second thing your guy does in office is cancel all of the projects that could provide you with more electricity.
And then he posts about it literally every day moaning about the lack of power, lack of solar, etc. All the things he bitches and moans about are things he caused by helping elect the orange fella.
By his own words, Elon is not an environmentalist and doesn’t seem to believe much in humanity’s impact on the climate. His concern is with the futility of relying on a non-renewable resource. He believes there is significantly more lithium than there is oil, I guess.
In the end, incentives are all that matter. Do hotels care deeply about the environment, or are they interested in saving in energy and labor costs as your towel is cleaned? Does it matter? Does moralizing really get us anywhere if our ends are the same?
I know it’s in vogue to dump on Elon these days, and with good reason, but do I not recall him on a number of occasions quite emotionally describing our continued CO2 emissions as the dumbest experiment in human history?
Yeah, but he flip-flops on the daily. He used to post about how LGBT positive Tesla was and post pride flags on his feed and now he's trying to burn the planet to the ground every time he hears about anyone that isn't a straight white man.
You do, and then at some point, likely during a late night ketamine binge, he went full redpill on twitter and decided the only thing that matters is “owning the libs”.
If that means embracing fossil fuels, so be it. Destroy the “woke mind virus at any cost”. That being said, I think he is delusional enough that he thought allowing nazi propaganda on twitter would convince conservatives to start buying teslas and is completely lost at this point.
That's just one facet of EVs that is severely overplayed in my book. They have plenty of other benefits, but for some of us the environmental aspect is a "nice-to-have".
I'm inclined to say the exact opposite about EVs. They take up as much space as internal combustion engine vehicles (in terms of streets, highways and parking lots), are just as fatal to pedestrians, make cities and neighborhoods less livable, cost in the tens of thousands of dollars, create traffic jams... the primary benefit is reducing our dependence on fossil fuels and generating less CO2. That's the number one differentiator. Faster acceleration, etc. is a nice-to-have.
> the primary benefit is reducing our dependence on fossil fuels and generating less CO2
for many, it's not even that. I like EVs primarily because I'm a tech-savvy person and like computers on wheels. but I'm also aware of their numerous downsides.
I care enormously about protecting the environment and stopping climate change, but I'm not an environmentalist.
Environmentalists usually care about the environment for its own sake, but my concern is our own survival. Similarly, I don't intrinsically care about plastic in the ocean, but our history of harming ourselves with waste we think is harmless would justify applying the precautionary principle there too.
As far as Musk goes, it's hard to track what he actually believes versus what he has said to troll, kowtow to Trump or "own the libs", but he definitely believes in anthropogenic climate change and he has been consistent on that. He seems to sometimes doubt the predictions of how quick it will occur and, most of all, how quickly it will impact us.
I think there probably is a popular tendency to overstate the predictive value of certain forecasts by simply grouping all climate science together. In reality, the forecasts have tended to be extremely accurate for the first order high level effects (i.e. X added carbon leads to Y temperature increase), but downstream of that the picture becomes more mixed. Particularly poor have been predictions of tipping points, or anything that depends on how humans will be affected by, or react to, changes in the environment.
Yes, Elon is probably playing fast and loose with the rules, but his 150MW of turbines are right next to the TVA's 1100MW of turbines and a steel mill. Not surprising given that it's a heavy industrial area, it's about 4 miles from any significant number of houses. There are plenty of good reasons to hate on Elon, but IMO this ain't it.
If thats the yard stick you should boycott everything coming out of china, which is pretty much everything, since they are one of the largest polluters globally.
Wouldn't you expect the country with the most manufacturing and one of the biggest population to also have the biggest pollution?
I feel you'd need to adjust the sum total by something, capita, or square footage or be more specific like does a manufacturing X in China pollute more than an equivalent one in the US, etc.
Not all goods and services involve the same process, some come with more pollution.
For example, Nvidia will contribute to a big chunk of US GDP, but it only designs the chips, which won't have the same pollution impact as the country in which they'll have it manufactured.
Doesn't really make sense in my opinion. Why boycott a specific group of people for their collective emissions when their individual emissions are lower than many others? The latter is the important metric, else you're simply punishing them for having a large population.
Technology does not exist separately from society and culture, and in the last few decades has arguably made a lot of the world and society worse. I’m all for using the biggest lever I have to address harmful behaviors from corporations. Withhold your wallet, stay off their platforms and make your reasons known.
I mean… this is part of GPs point. Here we are, playing on the lawn of private equitists, probably directly or indirectly working for the people that GGP was railing against.
Well Elon really worked hard to get that done. Campaigning for the guy who is cancelling in-progress solar and wind projects and claiming the feds will never approve another green energy plant.
(1) the utilization factor over the obsolescence-limited "useful" life of the hardware;
(2) the short-term (sub-month) training job scheduling onto a physical cluster.
For (1) it's acceptable to, on average, not operate one month per year as long as that makes the electricity opex low enough.
For (2) yeah, large-scale pre-training jobs that spend millions of compute on what's overall "one single" job, those are often ok to wait a few days to a very few weeks as would be from just dropping HPC cluster system operation to standby power/deep sleep on the p10 worst days each year as far as renewable yield in the grid-capacity-limited surroundings of the datacenter goes.
And if you can further run systems a little power-tuned rather than performance-tuned when power is less plentiful, to where you may average only 90% theoretical compute throughput during cluster operating hours (this is in addition to turning it off for about a month worth of time), you could reduce power production and storage capacity a good chunk further.
Most people aren't programming or operating heavy machinery at 4AM, either. Most power is consumed in the day, and most AI will be leveraged in the day.
China controls 80% of the supply chain for solar and has most of the rare earth magnets needed for wind. Since China is America’s bugbear and containing China’s influence is a bipartisan issue, this was a likely outcome whoever is in office
We don't have to guess what the most likely outcome might have been, someone else was in office 7 months ago so we can just look at what they were doing.
Were they "cancelling in-progress solar and wind projects and claiming the feds will never approve another green energy plant"? That's the "likely outcome" we're discussing.
Yes, the US has been scaling back on China-sourced renewable energy supply chains since 2023 at least, with tariffs and by removing incentives
Not exactly your wording at that time, but my point still stands that the outcome was going to be the same because the imports were heavily skewed towards China. This has all been in motion before this current admin
Ah, so this is what the Sonic model that Cursor had was. I've been doing this personal bench where I ask each model to create a 3D render of a guy using a laptop on a desk. I haven't written up a post to show the different output from each model, yet, but it's been a fun way to test the capabilities. Opus was probably the best -- Sonic put the guy in the middle of the desk, and the laptop floating over his head. Sonic was very fast, though!
I noticed it pop up on copilot so gave it about two attempts. Neither were fast, and both were incredibly average. Gpt4.1 and 5-mini do a better job, and 5-mini was faster...but I find speed of response varies hugely and seemingly randomly throughout the day.
Definitely fast, but initial use puts quality either comparable to or below gpt-5-nano. This might be a low-cost option for people who don't mind babysitting the output (or working in very small projects), but claude/gpt-5/gemini all seem to have significantly higher quality at marginally more cost/time.
By just emphasizing the speed here, I wonder if their workflows revolve more around the vibe practice of generating N solutions to a problem in parallel and selecting the "best". If so, it might still win out on speed (if it can reliably produce at least one higher-quality output, which remains to be seen), but also quickly loses any cost margin benefits.
I hated sonic but the latest release seems to have improved much. Build a small rust project from scratch, it was fast an very accurate. Interestingly enough it had some endless loop issue when creating a .gitignore file (using Opencode).
Fast is cool! Totally has its place. But I use Claude code in a way right now where it’s not a huge issue and quality matters more.
Opus 4.1 is by far the best right now for most tasks. It’s the first model I think will almost always pump out “good code”. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.
For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldn’t pay that much for just sonnet).
Am I right that there is no CLI yet? I feel like I'm crazy sometimes that many people see to be only using these through IDE when I feel like the CLI (claude, codex, gemini) are superior for most "agentic" tasks.
Is there something I am missing perhaps as to how one uses this stuff in VSCode for example? I have tried it a bit and it's fine but still prefer CLI for the agent and then IDE for me.
Just a few days ago I spent some time to sign up for Groq (not Grok, not Musk!) to implement fast code suggestions with qwen3-32b and gpt-oss-20b. Works handily with Jetbrains integrated AI features. I still use Claude Code as my "main" engineer, but I use these fast models for quick, fast edits.
As a user, “fast” is almost the last thing I want from a model.
I suspect AI companies try to promote fast because it’s really a euphemism for “less inference compute” which is the real metric they would like to optimize.
VSCode has the best copilot integration, but I found that Zed can perfectly use my subscription too, with even better results and endless "are you still here?"
They don't support Grok yet, though. It starts from a small "x", and it is ruined the deserialization. So could be a chance the pull request will miss "free trial" deadline for Grok Fast in Copilot, for this particular case.
Adding another positive note here. It works at incredible speeds in Cursor which allows me to iterate on prompts faster and not worry much about throwing away unsatisfactory work. This makes up for a lot of smaller issues if you know how to direct it. Output quality is decent too, at least for the problems I’ve tried.
It’s good for well defined tasks. Less good if you need it to be autonomous for long periods.
Have people already forgotten that Grok went full race supremacist twice already? Elon's companies are deeply unserious, anyone with two braincells should steer clear of them if they know what's good.
This will probably be a unpopular, wet blanket opinion...
But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.
Looks like they are bringing even more natural gas turbines online...great!
It's not the water that is the big problem here. It is the gas turbines and the location.
They started operating the turbines without permits and they were not equipped with the pollution controls normally required under federal rules. Worse, they are in an area that already led the state in people having to get emergency treatment for breathing problems. In their first 11 months they became one of the largest polluters in an area already noted for high pollution.
They have since got a permit, and said that pollution controls will be added, but some outside monitors have found evidence that they are running more turbines than the permit allows.
Oh, and of course 90% of the people bearing the brunt of all this local pollution are poor and Black.
Isn't the pollution exaggerated? Burning natural gas or methane is considered pretty clean, and produces mostly CO2 and water, which aren't toxic pollutants or a cause of breathing problems. That's why it's used inside homes in gas stoves.
There are a couple of ways to limit this. One is to avoid having nitrogen in whatever gas you use to provide oxygen. E.g., use pure oxygen, or use atmospheric air with the nitrogen removed. There is research and testing on this, but I don't think there is much commercialization yet.
Another is to use turbines designed to operate at lower temperature so that they don't reach the temperature where nitrogen and oxygen start forming nitrogen oxides. These are widely available. They are more expensive upfront, can be more finicky to operate, may require higher quality fuel, and may have more partial combustion which can lead to more partial combustion products like formaldehyde. However they can be more efficient which can lower operating costs.
A lot of it then comes down to regulatory costs. It may be cheaper to use a normal turbine with some add on to deal with NOx or it may be cheaper to use a low NOx turbine. That of course assume you even have to care about NOx. If you don't then the normal turbine is probably cheaper.
Something like 80-90% of gas turbine power plants in the US do use the low NOx turbines. However, rented gas turbines are mostly the normal ones. That's because they are easier to operate, require minimal maintenance, and are often more rugged, which are all good things for a rental. The turbines at the xAi Memphis datacenter are rentals. I believe they are intended to be temporary while the grid is improved to provide more power.
Not sure about the answer to the original motivating point, but as a tangent, gas stoves in homes do cause breathing problems (because of non-CO2/water products). Top couple of search results:
fast but not smart. Fine for non-critical "I need this query" or "summarize this" but it's pretty much worthless for real coding work (compared to gpt-5 thinking or sonnet 4)
I'd argue that even GPT-5 and Sonnet 4 at their highest reasoning budgets are not enough for "real coding work" because you still have to think about how an LLM should do it instead of what, and put it into a prompt. some harnesses, such as JetBrains Junie or Gemini CLI, make a good job of letting me drift into declarative prompts, but that's still not enough.
totally agree - they all need a human in the loop at this point. I'm constantly stopping gpt-5/sonnet 4 and steering. Unfortunately with grok it completely misses the plot constantly
To the people downvoting this comment -- it isn't just that Musk made a couple of very sharp nazi salutes. You may say, oh, that was just an unfortunate similarity, he wasn't doing a nazi salute at all. But he has a history of boosting nazi posts on twitter. Oh, Musk posts so often he can't vet the source of all of his retweets. But if those are mistakes, the fact is he never makes a mistake in the other direction, which strongly suggests it wasn't an accident.
Its focus seems to be on faster responses, which Grok 3 definitely is good at. I have a different approach to LLMs and coding, I want to understand their proposed solutions and not just paste garbled up code (unless its scaffolded) if you treat every LLM as a piecemeal thing when designing code (or really trying to figure out anything) and go step by step, you get better results from most models.
Maybe its just me but I wish models like this would also provide a normal chat interface.
The leap from taking advice and copy-pasting almost as a shameful fallback, to it just directly driving your tools is a tough pill. Having recently adjusted to "micro-dosing" on LLM's (asking no direct code output, smaller patches) when it comes to code to allow me to learn better is something I don't know how I would integrate with this.
Or do the agentic tools allow for this in some reasonably way and I just don't know?
An AI helmed by a deranged megalomaniac who keeps publicly tweaking it to conform to his fucked-up worldview is a fundamentally damaged product, no matter how many millions get poured into it or how shiny the splash page is. I feel like this should be stating the obvious, and any “hacker” from the old school would agree.
Alas, I’m sure the mods have manually disabled flags for this press release.
I don't normally like dunking on people because their parents or grandparents were nazi sympathizers who specifically moved to South Africa BECAUSE of Apartheid, but I do think it's fair game when they are trying to one-up that kind of legacy
A shame that so much of the discourse centers around one person, when in reality competition in the AI market - regardless of who it is - helps us all.
No one seemed to bat an eye when DeepSeek essentially distilled an entire model from OpenAI.
Yeah, I tried it in Copilot and it's fast, but I'd rather have a 2x smarter model that takes 10x longer. The competition for "fast" is the existing autocomplete model, not the chat models.
I haven't used Copilot in a while but Cursor lets you easily switch the model depending on what you're trying to do.
Having options for thinking, normal, fast covers every sort of problem. GPT-5 doesn't let you choose which IMO is only helpful for non-IDE type integrations, although even in ChatGPT it can be annoying to get "thinking" constantly for simple questions.
I have the option for either, but it's an option I'll never choose. My issue with Copilot wasn't speed, it's quality. The only thing that has to be fast is the text-completion part, which Grok isn't replacing. The code chat/agent part needs to focus on actually being able to do things.
AI coding tools are amazing and if you don't use them, that's fine. But lots of people, myself included, are finding tremendous utility in these models.
I'm getting 30-50% larger code changes in per day now. Yesterday I plumbed six slightly mechanical, but still major changes through our schema, several microservice layers, API client libraries, and client code. I wrote down the change sites ahead of time to track progress: 54. All requiring individual business logic. This would have been tedious without tab complete.
And that's not the only thing I did yesterday.
I wouldn't trust these tools with non-developers, but in our hands they're an exoskeleton. I like them like I like my vim movements.
A similar analogy can be made for the AI graphics design and editing models. They're extremely good time saving tools, but they still require a human that knows what they're doing to pilot them.
I'm not an animator and I made that with a few simple tools.
It has a lot of errors and mistakes that I didn't take the time to correct since it was just a silly meme, but do you see how accessible all of this is?
When people with intention and taste use these tools, the results are powerful. I won't claim that the above videos demonstrate this, but I can certainly do good work with these tools.
I don't see how this is anything short of revolutionary.
And, yes, I do. I don’t support CCP. I do, however, support hardworking Chinese people fighting for freedom. So, please don’t try to twist it like I am making some racist statement.
I support the X enterprise, its motives, and its agenda. I'm a happy paying customer. Question away as seriously as you please. But don't bother looping me into that dialog. I'm not interested.
Sure, I'm incredibly excited to use an LLM that has been intentionally trained to spread disinformation and is run by a Nazi sympathizer. Let me get right on that.
This is the model that was code named "Sonic" in Cursor last week. It received tons of praise. Then Cursor revealed it was a model from xAI. Then everyone hated it. :/ I miss the days where we just liked technology for advancement's sake.
*edit Case in point, downvotes in less than 30 seconds
I'm pretty sure everyone knew it was xAI last week. It's a great model. I'll never pay to use it, but I like it enough while it's free.
> I miss the days where we just liked technology for advancement's sake.
I think you haven't fully thought through such statements. They lead to bad places. If Bin Laden were selling research and inference to raise money for attacks, how many tokens would you buy?
People on here keep saying they would never use a Chinese model because that's allegedly America's "largest geopolitical adversary" but happily use a model from someone actively actually destroying America from within...
Things I noted:
- It's fast. I tested it in EU tz, so ymmv
- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.
- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.
- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.
All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.
IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.
reply