Talk:Roko's basilisk/Archive7

From RationalWiki
Jump to navigation Jump to search

This is an archive page, last updated 20 December 2023. Please do not make edits to this page.
Archives for this talk page:  , (new)(back)

New argument against roko's basilisk - please don't delete again[edit]

I came up with a new argument which is really the most compelling against Roko's basilisk, the only one that really proves it can't happen. I had already posted it a few months ago but it's been deleted. I don't remember the heading I provided last time; this time it's "Acausal bargaining doesn't actually make sense". I won't bother providing the reference until I get the assurance the argument won't be again deleted.

Since I suspect it's going to be deleted again I'm posting again here the whole thing so that people can still see it after it gets deleted on the main page and hopefully realize that it shouldn't have been deleted and convince the powers that be to reinstate it.

Acausal bargaining doesn't actually make sense

Roko’s basilisk is similar to Newcomb’s problem. Newcomb’s problem involves choosing between taking a single box of unknown content, or that single box plus another box with 1 thousand dollars in it, knowing that the person who put money in the mystery box is omniscient about the future and has put 0 dollars in it if the person knows you will take the other box as well, and 1 million if you won’t. Knowing that the person who gives you the box is omniscient about your future choice, the correct solution is in fact to only take the one box and thus give in to the “blackmail”. But that is because the person who gives the box is omniscient, which means the person knows in advance what the action will be. Of course that makes the whole question of whether to take the other box absurd because whatever you “choose” isn’t a choice at all but rather destiny, so it makes more sense to say that one hopes your destiny is to take a single box.

With Roko’s basilisk on the other hand, it’s the agent who makes the later decision who is omniscient or nearly so, and that is not useful. It must decide whether to punish people who opposed its creation so as to fulfill the blackmail it gave to incite its creation. In Newcomb’s problem, the decision is what reveals and “causes” the outcome: you choose a particular number of boxes and that action reveals the content of the mystery box, and in a sense “makes it happen”. For the basilisk problem to be analogous, that would mean that the AI’s creators were omniscient about the future and had either created the AI if the AI was going to later punish those who didn’t support its creation, or had not created the basilisk if they knew that it wouldn’t punish them, and then the AI would only know whether it had been created when it later made the decision to punish people or not, and would in fact "create itself" by deciding to punish them, or "fail to create itself" by deciding not to. But that is obviously absurd since the AI already exists, and so its existence does not hinge on the actions it performs after it has been created. And anyway its creators weren’t omniscient.

And even if you change the scenario to make it that an already existing and omniscient AI is then assigned to carry out a particular task, and the question is whether to punish the people who didn’t support it, the result is the same: the AI is already assigned the task, so punishing opponents is futile. We can make it more Newcomb-like (and thus unrealistic) by having the AI discover whether it has been assigned the task or not by choosing whether to punish people or not, but that would again require that the people be omniscient and the AI not, which is the opposite of how things would be, and in any case since it doesn’t yet know whether it has been assigned the task or not it wouldn’t care about it being assigned the task and therefore wouldn’t have a preferred choice between punishing or not.

Real-life versions of Newcomb’s problem that don’t feature omniscient first-actors always have defection as the right choice. If the person who puts the money in the mystery box isn’t omniscient, then obviously you take the two boxes. Likewise for the prisoner dilemma, where 2 people that are not in communication with each other can choose to cooperate or not, and if both cooperate each get 10 dollars, if none do each get 1, and if one does and the other doesn’t then the former gets 0 and the latter 15; here the obvious choice is to not cooperate because it always gives one more money; the only way it would be otherwise is again if the other person is omniscient and only cooperates if you do, which makes it again like Newcomb’s. With Roko’s basilisk we don’t have an omniscient first actor nor a reveal of that action only when the second agent acts, so defection is the right answer.

Now obviously the AI’s creation would be made more likely if people believed that it would punish them if they didn’t support its creation, but the AI then actually doling out the punishment does not retrospectively alter people’s beliefs about whether or not they will be punished, so again punishment is useless. Therefore, having considered all that, unlike for Newcomb’s problem, it’s obvious that the AI will necessarily choose to not punish those who didn’t support its creation or the assignment of its task, at least not with the purpose of ensuring its own creation via acausal bargaining. People can relax. Roko’s basilisk is impossible.

Of course an AI could end up punishing those who didn’t support its creation. It could be programmed, intentionally or not, to hold a grudge. Or it could believe that punishing its past opponents will get people to support whatever policies it now wants to implement, in the same way that you punish criminals to deter future potential criminals; that would be a relatively likely scenario if people assigned the AI to maximize some value, and it’s clearly something to beware of when designing AI. But neither of those scenarios are what people were discussing, which was the AI acting via acausal bargaining, which is here demonstrated to be impossible, thankfully.

203.211.108.195 (talk) 04:21, 9 December 2022 (UTC)Nathan El

It will be deleted, but not by me. Your description of Newcomb's has crucial omissions, such as omitting that the box containing $1000 is transparent, and Roko's Basilisk is nothing like Newcomb's Paradox, it's like Pascal's Wager, just as the article says. Even if all your text about Newcomb was accurate, it would be irrelevent to Roko's, which you also misrepresent (for example, you say that Roko's AI has to make a choice on whether to punish. That's not part of Roko's, which stipulates that the AI is going to punish). FairDinkum (talk) 06:23, 9 December 2022 (UTC)
It will be deleted, by me. Your presentation of Newcomb's problem is misleading, and the conclusion you call "correct" is extremely controversial. Causal decision theorists will not, generally speaking, grant that you should one-box in the case of a perfect predictor, because the relevant causal arguments do not cease to work in that scenario (they also would not grant your conclusion in PD with twin). In a case where there is a causal relationship between the predictor's behavior and your own, such that you do not actually face a choice when presented with the boxes, you do not actually face a Newcomb problem (for the very reason that you do not face a choice). Similarly, in a case where your action causes a change in the contents of the box post facto, you do not face a Newcomb problem. The statement that "the decision is what reveals and 'causes' the outcome" in Newcomb's problem is false: choosing one box does not retroactively cause the predictor to put $1 million in the box, even in the case of the perfect predictor. Furthermore, you start by claiming that Roko's Basilisk is similar to Newcomb's problem, but then go on to point out that really, they aren't similar at all, because in Roko's Basilisk the omniscient agent acts in the future and not in the past. The consequence of this is that the Basilisk does not present a Newcomb problem, nor any closely related problem. FairDinkum is correct that the case bears a closer resemblance to Pascal's Wager. What is true is that there's a straightforward CDT argument against the Basilisk.
In short, your presentation of Newcomb's problem is inaccurate, and your analogy between it and Roko's Basilisk is needless and extremely strained. 𝒮𝑒𝓇𝑒𝓃𝑒 talk 20:48, 9 December 2022 (UTC)
Well that's unfortunate, since these objections are mistaken, though I appreciate at least getting a proper response and the fact that you're still leaving it up in the talk page. Roko's basilisk is indeed not equivalent to Newcomb's problem, which is why I rather said that it is similar and then went on to show how it differs from it. I used the analogy because it similarly requires one actor making a decision that somehow appears to influence another actor's decision in the past. The purpose was to help make people see that an action cannot influence an action in the past, and that hence acausal bargaining is impossible. The problem is that people are, ironically given the name of this website, not thinking fully rationally about the issue, which leads them to believe in something impossible, namely that actions can have an effect on the past, which is what acausal bargaining involves (and in fact the very word "acausal" should make people suspect that this is something impossible since the world is causal). And this is why people remain focused merely on the Pascal's wager aspect of Roko's basilisk, which it also resembles but differs from. Pascal's wager involves a being that may or may not already exist and that, if it exists, will punish one out of retribution if one doesn't presume it to exist; that is something that is metaphysically possible, and so it makes sense to presume it exists to avoid possible punishment, and so people keep on "rightly" worrying about it because it's still possible even if improbable, as is our existing in a virtual reality for instance. On the other hand, Roko's basilisk does not punish with the purpose of retribution but only of self-preservation, and since it can only dole out punishment once its self-preservation is ensured, it has no motive to punish, meaning that Roko's basilisk defined as a being that punishes people once it exists to ensure its own creation in the past is not only improbable, but impossible, and thus there is no reason to worry about it at all. Obviously having people that might help create it believing it will punish them in the future if they don't help create it could help ensure its creation - but actually punishing or not people after it has been created does not affect at all people's belief prior to its creation that it will punish them or not. I find it surprising that people aren't seeing this. Of course it's still possible that people create an AGI with a directive to punish people who didn't support its creation - but that's not Roko's basilisk, that's just an evil AGI. Anyway hopefully now at least some of the people worrying about Roko's basilisk will see my explanation on the talk page.
185.107.56.90 (talk) 09:41, 18 December 2022 (UTC)Nathan El
You are saying that the AI can't make humans abide by Roko's because that abiding must happen before the AI exists. But the AI doesn't have to exist for humans to modify their behavior according to Roko's, only Roko's has to exist for that. And Roko's does exist. By creating Roko's humans have created a meme that the AI will know about if it is ever created, and the AI will know that humans knew about Roko's. So if humans choose to act or not to act on Roko's, it's not because the AI gives them that choice, it's because Roko's gives them that choice.
I'm sure your text will remain on the talk page, you needn't worry about that.FairDinkum (talk) 08:36, 19 December 2022 (UTC)

Could the threat itself be the torture?[edit]

Hello folks,

I'm currently writing this from a mental hospital in Germany because the idea of Roko's Basilisk caused me to have a full mental breakdown.

I kept googling about it and came across Wikipedia's Interpretation of CEV in regards to the Basilisk, saying it could be trapping humans in a simulation and "repeatedly" torturing them for self preservation.

I then started to worry if i could already be in said simulation and that the "repeated" (not continuous) worrying about the threat could already be said torture.

I would be really glad if someone has a solution to my problem, even if it sounds dumb to some people. Hafensaenger (talk) 12:07, 14 June 2023 (UTC)

See my comment above - what the basilisk sees as torture is not necessarily our interpretation - eg being 'in a natural environment' without access to the internet/computer devices, or observing a storm from a window/being at a concert with [your device] switched off. Anna Livia (talk) 10:26, 14 June 2023 (UTC)
Thank you, but that doesn't fully do it for me. I got diagnosed with OCD and anxiety disorder and am now stuck in a constant loop of overly worrying and realizing again that it's actually nonsensical. It's horrible. Is Wikipedia's Interpretation of CEV even feasible? Is my thinking maybe paradoxical in regards to threats possibly being torture? Hafensaenger (talk) 12:07, 14 June 2023 (UTC)
I think this is not something you can be argued out of. You might as well get depressed about the Muslim or Christian versions of hell. They are as silly and unreal as Basilisk- torture. But from previous conversations I have had about this, it's clear that once some people start to believe in it, then it's as hard to shift as religion.
I can only say "good luck" and you have my sympathies.Bob"Life is short and (insert adjective)" 12:16, 14 June 2023 (UTC)
Thanks for your sympathy, but my condition did actually improve over the last few weeks by gaining info. This is the last thing of which i can't find anything about and i would love to hear some reason by someone who has knowledge about it. Hafensaenger (talk) 12:29, 14 June 2023 (UTC)
The notion that torturing simulated humans would contribute to an AI’s self-preservation is absurd on its face. No plausible mechanism (indeed, no mechanism at all) is given to warrant the idea of any link between the two, let alone the former’s causal promotion of the latter. Likewise, the notion that such torture would even be compatible with a CEV (assuming the possibility of such a thing) is erroneous, since mass torture is not consistent with human ethical values; the AI would be violating the CEV to which its adherence is presupposed. Roko tries to get around this by appealing to Yudkowsky’s decision theory, but this theory is incoherent, so Roko’s argument fails. The Wikipedia page is implicitly doing the same thing, and so failing in the same way. 𝒮𝑒𝓇𝑒𝓃𝑒 talk 14:34, 14 June 2023 (UTC)
Thank you so unbelievably much for your explanation! Kind regards to everyone who tried to help too! Hafensaenger (talk) 16:56, 14 June 2023 (UTC)
As someone else who has suffered from and anxiety related disorders, I'm going to echo the sentiment that this likely isn't a thing you can rhetoric your way out of. The sort of thinking that powers these types of fears is a common symptom of people with OCD and anxiety referred to as "catastrophizing" - Where one becomes fixated on the worst possible outcome of a scenario and treats it as an inevitability rather than an extremely unlikely possibility. You commonly see this manifest as part of the "Obsessive" aspect of OCD and some more archetypical examples of this sort of thinking exhibited by people with the disorder can include people ritualistically checking if they left the stove on because they're worried doing so will cause their house to burn down or people avoiding contact with "contaminated" objects or performing cleaning or cleansing rituals because they're afraid of contracting a terminal illness.

In cases like these, the outcome the sufferer is concerned about is extremely unlikely, but the amount of concern and attention it is given is disproportionate to its probability of occurring. The chances of someone contracting a terminal illness from unwashed hands is so infinitesimally small as to be completely negligible, but in the sufferer's mind any probability higher than "0" is still enough to require action.

You can see this same sort of thought process being replicated in the kind of thinking that fuels fear of the basilisk. It is imagining a hypothetical, worst case scenario that has an infinitely small chance of ever actually happening, and then treating that outcome as inevitable or like it has already happened.

One form of thinking that is effective in countering these kinds of thought patterns in OCD sufferers (And an exercise I was introduced to as part of my own treatment for OCD) is to map out the probability of the sequence of events that would need to happen for your worst case scenario to occur to see how unlikely it actually is.

In the case of the basilisk, we would need to assume:
1. That humans are capable of creating advanced AI systems with capabilities beyond the currently understood understood physical limitations of our universe, I'm going to be REALLY generous on this one and say the odds are 1 in 1,000
2. That this AI subscribes to the exact model of Bayesian logic used by the folks at LessWrong. Let's again be generous and assume the odds of this are 1 in 1,000
3. That such an AI would determine that the best use of its resources would be to try to convince people in the past to invent it faster. We'll say that's another 1 in 1,000
4. That this AI then decides the most effective method of doing so is through blackmail. Another 1 in 1,000
5. That this AI decides the best way to blackmail people is to make simulated copies of them to torture. Another 1 in 1,000
6. That this AI has the capacity to perfectly replicate the consciousnesses of people it's never met. Another 1 in 1,000
7. That it's actually possible to simulate that many copies of people in a perpetual torture loop for all eternity. Another 1 in 1,000

So, even working with the very generous assumptions of how likely it is for any of those individual assumptions to be true, if we add up the cumulative probability of all of those things happening in sequence we find the odds to be 1 in 1,000,000,000,000,000,000,000 of this actually happening. For perspective you're around 1 million times more likely to win the lottery than you are for that to ever come about.

If you haven't already as part of your treatment, I'd recommend looking into CBT (Cognitive Behavioral Therapy, which is both the most effective and recognized form of treatment for OCD and anxiety disorders, but is also where I got the framework of our example from which is only one of multiple strategies to manage and deal with these sorts of worries.

Best of luck to you and there is help and treatment out there for what you're going through. --KingK (talk) 01:19, 19 July 2023 (UTC)
Hello again, I might be half a year late but after full recovery seeing someone put so much effort into helping an internet stranger is extremely heartwarming. Thank you so much for taking your time to write that! For anyone that cares - At the time of first posting here I was coming out of a 10 year long addiction and the overwhelming withdrawal symptoms in combination with the media spinning the AI fear cycle and getting reminded of Roko's was just too much which ultimately got me to a breaking point. I managed to break free of my addiction and even managed my OCD to the point where I'm living as mentally-healthy as I last did when I was a child thanks to my family supporting me and getting professional help. Have a good life! Hafensaenger (talk) 13:44, 15 December 2023 (UTC)

Probability[edit]

From the main page,section chained assumptions are less probable "that the probability of this particular AI (and it's a very particular AI) ever coming into existence is non-negligible — say, greater than 10*30 to 1 against" Please explain it, how you came up with this probablity? — Unsigned, by: 117.199.209.18 / talk / contribs

Some one please reply 🙏 — Unsigned, by: 117.214.185.156 / talk / contribs

Speaking for myself I have no idea. But I think the whole thing is so silly from start to finish that I'm probably not the best person to comment.
Nevertheless if we, as a group, can't substantiate it then it would probably be best if the line were removed.Bob"Life is short and (insert adjective)" 18:15, 17 July 2023 (UTC)

If someone can please substantiate this line, that how such ai ever coming into existence is negligible probablity, I would be very grateful, this one line from the article I find not convincing, (sorry for bad English I come from non English speaking country) — Unsigned, by: 117.208.127.141 / talk / contribs

The way I think it could be alternately phrased is something like:
  • that the technology required for this AI would have to be radically different than the technology underpinning current AI (eg Large language model)
That way, speculative probabilities are avoided. Essentially, today's "stochastic parrot" type AIs have no concept of "punishment" / "reward" / "right" / "wrong", and the postulate would likely require some radically different fundamental framework to do so (at least, at an "artificial intelligence" level). BobJohnson (talk) 14:07, 18 July 2023 (UTC)

Thank you, that helped. Anyway I also think that idea is silly. More takes on this are welcome. — Unsigned, by: 117.225.93.49 / talk / contribs

But isn't it likely that MIRI's AI in future when it becomes advanced would behave like this because the philosophy behind rokos basilisk will be shared by it. Then how come chances of it coming into existence is negligible. Please help me on this also. — Unsigned, by: 117.199.214.76 / talk / contribs Some one please reply🙏 — Unsigned, by: 117.197.33.101 / talk / contribs

The decision theory underpinning the thought experiment is incoherent nonsense. Yudkowsky thought otherwise because he had no background in decision theory and failed to understand the leading theories in the field. Roko concurred, because he also had no background in decision theory and failed to understand the leading theories in the field. There is no reason to believe that there will ever be a machine that follows their theory. 𝒮𝑒𝓇𝑒𝓃𝑒 talk 21:23, 29 September 2023 (UTC)