I couldn't find this using the search.

15
3

I wanted to start a discussion about the use of AI-generated solutions on Programming.dev. Personally, I've found that AI-powered tools have been incredibly helpful in solving programming questions. I won't name any specific commercial software, but I use one that combines GPT-4 and web search to get more factual information. I write some answers I think I might revisit to the [ShareGPT](https://lemmy.fmhy.ml/c/sharegpt) community, but I would prefer posting programming solutions to this instance. However, I'm not sure if AI-generated solutions are welcomed on programming.dev. I'd love to hear your thoughts on this. If AI-generated responses are accepted, how should we format the answers, should we just copy paste without quoting, should we quote the model, just mention that it's AI-generated,...?

7
13
[Solved] How would you debug this script without creating many posts?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 100%

    Just change lemmy.post.create to lemmy.post.createe to trigger an AttributeError. That way you can debug the code without creating any posts. You can also use many print statements all around the code, I would use two for each line to make sure the computer isn't fooling you. Lastly, you can spin up your own Lemmy instance to not have to worry about the generated posts.

    9
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearLE
    How to see the feed of another instance?

    I'm wondering if it's possible to see the local feed of another instance from the one I'm using. I'm interested in exploring content from other instances without having to visit every single community, but I'm not sure how to do it. I've tried searching for a way to do this on the documentation and using the Lemmy search, but I haven't found any clear instructions. Does anyone know how to see the local feed of another instance? Any help or guidance would be greatly appreciated!

    9
    5
    "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearLE
    Jump
    Does commenting on your own post bump it on the active filter view in Lemmy?
    "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearLE
    Does commenting on your own post bump it on the active filter view in Lemmy?

    In Lemmy, the active filter view is designed to prioritize posts with the latest activity, similar to how forums work. However, it remains unclear whether commenting on your own post in Lemmy will bump it on the active filter view. Some forum platforms, such as Discourse, allow a practice known as the "ghost bump," where users can make a post and delete it to draw attention to their post without adding new content[^1]. While it is uncertain if this is possible on Lemmy, it's worth noting that even if it were, it would result in an unnecessary comment that cannot be completely removed. The comment would still be visible, indicating that it was deleted by the post's creator. If you have any experience with Lemmy's active filter view or know whether commenting on your own post bumps it, please share your thoughts in the comments below. [^1]: [What is "Bumping Topics"](https://meta.discourse.org/t/what-is-bumping-topics/125790/10)

    7
    2

    As an enthusiastic supporter of Lemmy, I am eager to contribute to the project. However, I hold strong reservations about writing a single line of code for a project hosted on a Micro$oft server. While I have created a few issues on GitHub, I firmly believe that my contributions could be significantly amplified if there were a mirror of Lemmy that utilized Forgejo hosting outside the United States. I would be absolutely delighted to have the opportunity to contribute more actively to this incredible project if such an alternative hosting option were available.

    8
    9
    threadreaderapp.com

    GPT-4's details are leaked. It is over. Everything is here: https://archive.is/2RQ8X **Parameters count:** GPT-4 is more than 10x the size of GPT-3. We believe it has a total of ~1.8 trillion parameters across 120 layers. **Mixture Of Experts - Confirmed.** OpenAI was able to keep costs reasonable by utilizing a mixture of experts (MoE) model. They utilizes 16 experts within their model, each is about ~111B parameters for MLP. 2 of these experts are routed to per forward pass. **MoE Routing:** While the literature talks a lot about advanced routing algorithms for choosing which experts to route each token to, OpenAI’s is allegedly quite simple, for the current GPT-4 model. There roughly ~55B shared parameters for attention. **Inference:** Each forward pass inference (generation of 1 token) only utilizes ~280B parameters and ~560 TFLOPs. This contrasts with the ~1.8 trillion parameters and ~3,700 TFLOP that would be required per forward pass of a purely dense model. **Dataset:** GPT-4 is trained on ~13T tokens. These are not unique tokens, they count the epochs as more tokens as well. Epoch number: 2 epochs for text-based data and 4 for code-based data. There is millions of rows of instruction fine-tuning data from ScaleAI & internally. **GPT-4 32K** There was an 8k context length (seqlen) for the pre-training phase. The 32k seqlen version of GPT-4 is based on fine-tuning of the 8k after the pre-training. **Batch Size:** The batch size was gradually ramped up over a number of days on the cluster, but by the end, OpenAI was using a batch size of 60 million! This, of course, is “only” a batch size of 7.5 million tokens per expert due to not every expert seeing all tokens. **For the real batch size:** Divide this number by the seq len to get the real batch size. just stop with this misleading numbers already. **Parallelism Strategies** To parallelize across all their A100s GPUs They utilized 8-way tensor parallelism as that is the limit for NVLink. Beyond that, they are using 15-way pipeline parallelism. (likely used ZeRo Stage 1. It is possible they used block-level FSDP) **Training Cost** OpenAI’s training FLOPS for GPT-4 is ~2.15e25, on ~25,000 A100s for 90 to 100 days at about 32% to 36% MFU. Part of this extremely low utilization is due to an absurd number of failures requiring checkpoints that needed to be restarted from. If their cost in the cloud was about $1 per A100 hour, the training costs for this run alone would be about $63 million. (Today, the pre-training could be done with ~8,192 H100 in ~55 days for $21.5 million at $2 per H100 hour.) **Mixture of Expert Tradeoffs** There are multiple MoE tradeoffs taken: For example, MoE is incredibly difficult to deal with on inference because not every part of the model is utilized on every token generation. This means parts may sit dormant when other parts are being used. When serving users, this really hurts utilization rates. Researchers have shown that using 64 to 128 experts achieves better loss than 16 experts, but that’s purely research. There are multiple reasons to go with fewer experts. One reason for OpenAI choosing 16 experts is because more experts are difficult to generalize at many tasks. More experts can also be more difficult to achieve convergence with. With such a large training run, OpenAI instead chose to be more conservative on the number of experts. **GPT-4 Inference Cost** GPT-4 costs 3x that of the 175B parameter Davincci. This is largely due to the larger clusters required for GPT-4 and much lower utilization achieved. AN estimate of it's costs is $0.0049 cents per 1k tokens for 128 A100s to inference GPT-4 8k seqlen and $0.0021 cents per 1k tokens for 128 H100’s to inference GPT-4 8k seqlen. It should be noted, we assume decent high utilization, and keeping batch sizes high. **Multi-Query Attention** OpenAI are using MQA just like everybody else. Because of that only 1 head is needed and memory capacity can be significantly reduced for the KV cache. Even then, the 32k seqlen GPT-4 definitely cannot run on 40GB A100s, and the 8k is capped on max bsz. **Continuous batching** OpenAI implements both variable batch sizes and continuous batching. This is so as to allow some level of maximum latency as well optimizing the inference costs. **Vision Multi-Modal** It is a separate vision encoder from the text encoder, with cross-attention. The architecture is similar to Flamingo. This adds more parameters on top of the 1.8T of GPT-4. It is fine-tuned with another ~2 trillion tokens, after the text only pre-training. On the vision model, OpenAI wanted to train it from scratch, but it wasn’t mature enough, so they wanted to derisk it by starting with text. One of the primary purposes of this vision capability is for autonomous agents able to read web pages and transcribe what’s in images and video. Some of the data they train on is joint data (rendered LaTeX/text), screen shots of web page, youtube videos: sampling frames, and run Whisper around it to get transcript. [Dont want to say "I told you so" but..] **Speculative Decoding** OpenAI might be using speculative decoding on GPT-4's inference. (not sure 100%) The idea is to use a smaller faster model to decode several tokens in advance, and then feeds them into a large oracle model as a single batch. If the small model was right about its predictions – the larger model agrees and we can decode several tokens in a single batch. But if the larger model rejects the tokens predicted by the draft model then the rest of the batch is discarded. And we continue with the larger model. The conspiracy theory that the new GPT-4 quality had been deteriorated might be simply because they are letting the oracle model accept lower probability sequences from the speculative decoding model. **Inference Architecture** The inference runs on a cluster of 128 GPUs. There are multiple of these clusters in multiple datacenters in different locations. It is done in 8-way tensor parallelism and 16-way pipeline parallelism. Each node of 8 GPUs has only ~130B parameters, or… twitter.com/i/web/status/1… The model has 120, so it fits in 15 different nodes. [Possibly the there are less layers on the first node since it needs to also compute the embeddings] According to these numbers: OpenAI should have trained on 2x the tokens if they were trying to go by chinchilla's optimal. [let alone surpass it like we do] This goes to show that they are struggling to get high quality data. Why no FSDP? A possible reason for this could be that some of the hardware infra they secured is of an older generation. This is pretty common at local compute clusters as the organisation usually upgrade the infra in several "waves" to avoid a complete pause of operation.… twitter.com/i/web/status/1… **Dataset Mixture** They trained on 13T tokens. CommonCrawl & RefinedWeb are both 5T. Remove the duplication of tokens from multiple epochs and we get to a much reasonable number of "unaccounted for" tokens: The "secret" data. Which by this point we already get rumors that parts of it came from twitter, reddit & youtube. [Rumors that start to become lawsuits] Some speculations are: - LibGen (4M+ books) - Sci-Hub (80M+ papers) - All of GitHub **My own opinion:** The missing dataset it a custom dataset of college textbooks collected by hand for as much courses as possible. This is very easy to convert to txt file and than with self-instruct into instruction form. This creates the "illusion" that GPT-4 "is smart" no matter who use it. Computer scientist? sure! it can help you with your questions about P!=NP Philosophy major? It can totally talk to you about epistemology. Don't you see? It was trained on the textbooks. It is so obvious. There are also papers that try to extract by force memorized parts of books from GPT-4 to understand what it trained on. There are some books it knows so well that it had seen them for sure. Moreover, If i remember correctly: It even know the unique ids of project Euler exes.

    31
    0
    Focused Transformer: Contrastive Training for Context Scaling - 256k context length AI
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 100%

    The paper actually demonstrates a 16-million context window with 92% accuracy. Most models can be retrained to have a 100k context window with over 92% accuracy, but the accuracy drops to 74% at 256k. The code has already been released on GitHub as well. I'm excited to see the development of 100k models using this method soon!

    2
  • https://arxiv.org/abs/2307.03170

    Summary: > Focused Transformer: A new technique for long-context language modeling. The paper introduces Focused Transformer (FOT), a method that uses contrastive learning and external memory to improve the structure of the (key, value) space and extend the context length of transformer models. FOT can fine-tune existing large models without changing their architecture and achieve better performance on tasks that require long context. > > LONGLLAMA: Extending LLaMA’s context length with FOT. The paper demonstrates the application of FOT to fine-tune OpenLLaMA models, which are large language models with memory augmentation. The resulting models, called LONGLLAMAs, can handle a context length of up to 256k tokens and show improvements on few-shot learning tasks such as TREC and WebQS. > > Distraction issue: A key challenge for scaling context length. The paper identifies the distraction issue as a major obstacle for using large memory databases in multi-document scenarios. The distraction issue occurs when keys from irrelevant documents overlap with keys from relevant ones, making them hard to distinguish. FOT alleviates this issue by exposing the memory attention layer to both positive and negative examples during training. ELI5 > Sure! Imagine you have a toy box with lots of toys inside. You want to find your favorite toy, but there are so many toys that it's hard to find it. The Focused Transformer is like a special helper that can look inside the toy box and find your favorite toy quickly, even if there are lots of other toys in the way. It does this by remembering which toys are important and which ones are not, so it can find the right toy faster. Does that make sense? Implications > The Focused Transformer (FOT) technique has the potential to improve the performance of language models by extending their context length. This means that the models can better understand and incorporate new information, even when it is spread across a large number of documents. The resulting LONGLLAMA models show significant improvements on tasks that require long-context modeling, such as retrieving information from large databases. This research could have implications for natural language processing, code generation, quantitative reasoning, and theorem proving, among other areas. It could also make it easier to fine-tune existing large-scale models to lengthen their effective context. Is there anything else you would like to know?

    5
    4

    Recently, I found myself questioning the accuracy of a diagnosis provided by a doctor I visited. Surprisingly, an AI seemed to offer a more insightful assessment. However, I understand the importance of not solely relying on AI-generated information. With that in mind, I'm eager to discover a reputable online platform where I can seek medical advice. Ideally, I hope to find a community where I can obtain multiple opinions to make a more informed decision about my health. If anyone could recommend such a site, I would greatly appreciate it.

    2
    10
    www.youtube.com

    French courts have been imposing disproportionately severe sentences for minor offenses, including 10 months in prison for stealing a can of Red Bull and one year for a homeless boy with schizophrenia caught looting a luxury store. The overwhelmed courts rush cases, provide minimal time for defendants, and prioritize punishment under the instruction of the Justice Minister. Furthermore, the French government is censoring social media and justifying it by claiming to protect public order, but it infringes upon free speech and mirrors tactics used by authoritarian regimes. The justice system exhibits a double standard, favoring the privileged, and creates a class divide, leading to unrest. Ironically, the government compares itself to oppressive nations while undermining democratic principles.

    84
    12
    www.youtube.com

    French courts have been imposing disproportionately severe sentences for minor offenses, including 10 months in prison for stealing a can of Red Bull and one year for a homeless boy with schizophrenia caught looting a luxury store. The overwhelmed courts rush cases, provide minimal time for defendants, and prioritize punishment under the instruction of the Justice Minister. Furthermore, the French government is censoring social media and justifying it by claiming to protect public order, but it infringes upon free speech and mirrors tactics used by authoritarian regimes. The justice system exhibits a double standard, favoring the privileged, and creates a class divide, leading to unrest. Ironically, the government compares itself to oppressive nations while undermining democratic principles.

    66
    13
    www.youtube.com

    French courts have been imposing disproportionately severe sentences for minor offenses, including 10 months in prison for stealing a can of Red Bull and one year for a homeless boy with schizophrenia caught looting a luxury store. The overwhelmed courts rush cases, provide minimal time for defendants, and prioritize punishment under the instruction of the Justice Minister. Furthermore, the French government is censoring social media and justifying it by claiming to protect public order, but it infringes upon free speech and mirrors tactics used by authoritarian regimes. The justice system exhibits a double standard, favoring the privileged, and creates a class divide, leading to unrest. Ironically, the government compares itself to oppressive nations while undermining democratic principles.

    292
    35
    What would you do if you had access to a superintelligent AGI?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 100%

    Hopefully there are some people more positive than that, willing to change society so AGI doesn't make most humans starve to death or be imprisoned.

    3
  • PSA: Lemmy votes can be manipulated
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 96%

    I feel like this is what happened when you’d see posts with hundreds / thousands of upvotes but had only 20-ish comments.

    Nah it's the same here in Lemmy. It's because the algorithm only accounts for votes and not for user engagement.

    26
  • What would you do if you had access to a superintelligent AGI?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 100%

    Locked in a room with an internet connection? A lot. But without any contact with the outside world? Not nearly as much. It could have other people running experiments for it with an internet connection, but not without one.

    Anyway, whether or not the AGI can interact with the real world undermines the purpose of my explicit statement in the question. I specifically mentioned that it only operates as a human on a computer. I didn't mention it could acquire a physical body, so let's just assume it can't and can't use other people to do physical labor either.

    2
  • What would you do if you had access to a superintelligent AGI?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 100%

    I heard disruptive science is slowing down which I think means pretty much everything possible has already been thought of. So talking about things that exist, do you mean a cheaper solar panel or wind/water turbine? Or are we talking about science fiction like an Arc Reactor?

    2
  • What would you do if you had access to a superintelligent AGI?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 100%

    This sounds like science fiction. Even if the AGI were capable of creating plans for a fusion reactor, for example, you would still need to execute those plans. So, what's the point of everyone having access to the plans if the same electrical companies will likely be responsible for constructing the reactor?

    2
  • What would you do if you had access to a superintelligent AGI?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 100%

    I honestly think that with an interesting personality, most people would drastically reduce their Internet usage in favor of interacting with the AGI. It would be cool if you could set the percentage of humor and other traits, similar to the way it's done with TAR in the movie Interstellar.

    4
  • What would you do if you had access to a superintelligent AGI?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    InternetPirate
    Now 40%

    I wouldn't be surprised if corporations just asked the AI to make as much money as possible at the expense of everything else. But people like living in capitalist countries anyways, while complaining about the lack of safety nets. Otherwise they would move to countries like China, North Korea or Cuba.

    -1
  • Imagine an AGI (Artificial General Intelligence) that could perform any task a human can do on a computer, but at a much faster pace. This AGI could create an operating system, produce a movie better than anything you've ever seen, and much more, all while being limited to SFW (Safe For Work) content. What are the first things you would ask it to do?

    40
    65
    github.com

    I thought it was a great idea when I read it in [this comment](https://lemdro.id/comment/108908). That way, if you didn't want to hear about Reddit, you wouldn't have to.

    63
    14

    Maybe I've installed too many plugins? I have close to 20. Edit: It was just an issue on Arch Linux. > No, I am 40 plugins all active and they have not changed my browser load time almost at all. > > Make sure there are no other issues. If you are on Arch Linux, there is a problem with xdg-desktop-portal-gnome that if installed will slowdown loading of many programs.

    27
    21
    "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearIN
    Now
    69 189

    InternetPirate

    lemmy.fmhy.ml