I've been following the LLM news for a long time and am finally fed up with news like this:
<small-model> m with just n parameters BEATS/CRUSHES/OUTPERFORMS/OBLITERATES/DESTROYS <big-model> M that has 2n parameters (or even more) in all the benchmarks [0], and it's close to GPT-4 too.
^[0] All the benchmarks that support this claim.
And then later you find out that m was created by this startup that needs funding. When do we stop doing this? It's borderline spam at this point. Everyday I see this and I immediately skip those because benchmarks are meaningless now and whoever uses them to prove a point is just doing it for the VC money.
It'd be nice if the mods on r/locallama would ban such posts because they don't really add any value to the discussion around LLMs; they're just noise. We need real progress in open-source LLMs, you know, like:
- decent function calling
- standardized API calling
- standardized prompt templates
- better documentation
- supporting projects like Petals
- working on LLMs on the edge (e.g., the MLC-LLM project)
- creating cool things with small LLMs such as Copilots for specific tasks
- increasing the awareness of ordinary users about ChatGPT alternatives
But claiming that a small model is somehow better than bigger ones just because it was trained on ChatGPT/4 data to make its responses look like GPT-4 answers, is not honest. Yes, Mistral is good, but is it better than Llama 2 70B? Hard no! Yes, Phi-2 is good, but should we even care when Microsoft hardly shares their precious training data? Again, no! People get excited about new models dropping literally every day that they forget to focus on the big picture and ask themselves: "Is this really good for the open-source industry?"