ARG:Lunacy: Difference between revisions

Latest revision as of 18:21, 10 October 2024

This page contains spoilers about the Heartbound ARG. If you want to solve the puzzle yourself, avoid reading this article.

Lunacy was found by solving Painful Memory. That led us to the page https://gopiratesoftware.com/games/Heartbound/CORE_DUMP/MEMORY_ALPHA/CRASH_LOG/BOOT_RECORD/DEEP_THOUGHT/2140425/U4042/HEAPCHK/U4043/RESISTOR_TWISTER/FINAL_MACHINE/

Purple moon with some spots on it — moonspots.png

The page contains a purple moon and the following comments on its source code:

<!-- They came from the moon. -->
<!-- ISZPGUHXFYCCAKQQWUQACBFALHV -->
<!-- FFOIWTWSDACBFALH -->

Moon spots

By looking at the page's source code, we find that the purple moon image is called moonspots.png. After downloading the image, and opening in in a text editor (like Notepad), we can notice that there is some text at the end of the file, after the image data has ended.

The text does not end in the usual == of Base64. But it was used quite a few times on this ARG already, in order to encode some image in text. So it is not unreasonable to assume that it is Base64. If we input the text in an online decoder, it is possible to notice the term PNG in the begining, which suggests that the text encodes a PNG image.

From this point you can just paste the text in a text document and upload it in the aforementioned online decoder in order to get it converted into an image, and that will work fine. But we find more interesting to show one way that you can build your own tool for that, using Python.

Extracting the image

A valid PNG file always end with the characters IEND, followed by 4 bytes representing the CRC32 checksum of the image. That is where the file should normally end, and if anything comes after that, it is ignored by image decoders. This is where the our text comes, we need to extract everything that comes after the end of the PNG file, and then decode it using Base64.

Here is one way to do that in Python:

from base64 import b64decode

# Open the image and read its contents into memory
# Note: the moonspots.png file needs to be on the same folder as this script
with open("moonspots.png", "rb") as input_file:
    data = input_file.read()

# Look for the position where the PNG file ends
# Note: we are adding 8 to account for the 4 characters of IEND, and the 4 bytes of the CRC32 checksum.
end_index = data.rfind(b"IEND") + 8

# Get everything that comes after the end of the PNG file
b64_text = data[end_index:]

# Decode the data using Base64
new_image = b64decode(b64_text)

# Save the decoded data into a new file
with open("puzzle.png", "wb") as output_file:
    output_file.write(new_image)

What we are doing is to search for the characters IEND, that indicate the end of a PNG file. Then jumping over the 4 bytes checksum after it, and getting everything past that point. Then that data goes through a Base64 decoder, and the output is saved to a new file.

Just a friendly reminder that it is not strictly necessary for you to do any of this by yourself. You can just go through an online decoder and get the same results. But we find nice to remind that you do not always need to rely on online tools, that might end up disappearing without notice. And likely there will be cases in which no existing tools are available. So it might help learning some fundamentals of programming, and Python is great for that.

It is almost as if we are going to need to build our own tools for some upcoming steps ;)

Moon phases

After decoding the Base64 data from the previous step, we got this image:

The symbols on the image are the phases of the moon. We need to somehow figure out what message is written with those symbols. Let's crack the cipher.

When decoding a cipher, it might help to look for patterns and also remember what you found before. We have found the letters to numbers cipher often through the ARG, so it is not a bad assumption that the moons represent digits on that cipher. We have 9 different symbols on the image. That could compromise the assumption, since there are 10 decimal digits, but it is still possible that one of the digits went unused.

A pattern that we can notice on the image is that, counting the columns from 1, you might notice that on the odd columns there is a higher frequency of the New Moon symbol (🌑︎). Earlier on the ARG we ran through a letters to numbers cipher that used the digit 0 for padding, so each letter was represented by exactly two digits (A=01, B=02, ..., Z=26). If this is also happening here, then the New Moon symbols on the odd columns should be the leading zero.

So it the hypothesis that this is letters to numbers is looking better. It might be worthy to do further analysis in order to find the other symbols.

Frequency analysis

When trying to decode a cipher, it is a common technique to count how often a certain symbol appear, and then to correlate it with how often a letter normally appear in regular text. This is called Frequency Analysis.

But our case has the twist that we do not have the symbols directly representing letters, but rather digits. So what we are going to do now is to figure out how often each of the decimal digits (0 to 9) show up in English text, if its letters are first converted to numbers (A=01, B=02, ..., Z=26).

For that we need a sufficiently large sample of English text. The larger, the better, as long it has proper grammar and orthography. Any text sample that fits that criteria can be used. On this article we are going to use the Douay-Rheims Bible] from Project Gutenberg.

Needless to say, it would be next to impossible to do the analysis by hand. We are going to write a Python script for it. What the script needs to to is to convert all characters to the same case, in order to have no ambiguity between lowercase and uppercase letters. Then get only the characters that are letters from A to Z, and replace them by their alphabet position (A=01, B=02, ..., Z=26). Finally, it is counted how many times each digit shows up, and their percentages are calculated.

The script goes like that:

from collections import Counter
from string import ascii_uppercase

# Load the text into memory
# (the text sample we are using needs to be on the same folder as this script)
with open("pg8300.txt", "rt") as file:
    while "START OF THE PROJECT GUTENBERG EBOOK" not in file.readline():
        pass
    text = file.read()
# Note: We are skipping the header added by Project Guttenberg, by discarding the lines until the text "START OF THE PROJECT GUTENBERG EBOOK". If you are using a text sample from somewhere else, you probably can just remove that part of the code and just read the entire text file.

# Convert all text to uppercase and get only the A-Z letters, then count the letters
letters = "".join(letter for letter in text.upper() if letter in ascii_uppercase)
letter_count = Counter(letters)
letter_total = len(letters)

# Count the digits in the letters to numbers cipher (A1Z26)
digit_count = Counter()

for letter, count in letter_count.most_common():
    alphabet_position = str( ascii_uppercase.index(letter) + 1 ).ljust(2, "0")
    local_digit_count = {digit: count for digit in alphabet_position}
    digit_count.update(local_digit_count)
# Note 1: for the single digit numbers, we are adding a leading zero to make them 2 digits long
# Note 2: we are counting the alphabet position from 1 (A=01, B=02, ..., Z=26).

# Print the percentages of digits
digit_total = sum(digit_count.values())

for digit, count in digit_count.most_common():
    percent = count * 100 / digit_total
    print(f"{digit}: {percent :>6.3f}%")

Even though the text sample used here has around 4 million characters, the script should finish in less than a second. The script gives us the following percentages for each digit (for a letters to numbers cipher, in English):

0: 29.377%
1: 22.967%
2: 11.426%
5: 11.038%
8:  6.849%
9:  6.216%
4:  5.915%
3:  3.254%
6:  2.048%
7:  0.909%

The results look coherent to what we could expect. 0 and 1 are the most common because the representation of most letters are going to begin with either of them. 2 is the third most common because there is a considerable amount of letters that would begin with it. 7 is the least common because it is used to represent G (07) and Q (17), which are two uncommon letters in English.

Now we need to also calculate the frequencies of moon symbols on our cipher. In order to be easier to deal with the symbols programmatically, let's assign a different letter for each moon symbol. Which letters do not really matter, as long each symbol always get represented by the same letter. Our moon cipher can be represented as:

FGDFEGFBFEEHEEFGEBFHFA
EHFAFBFEEHEEFAEHDFFHFE
EBFGDFFHEADEFIFHDFFGDF
DCEADEEDFBFDFAFBFEEHEE
FAEHDFFHFEEBFHFAEGFEFG
FBFGDFDCEADEEDFBFDFAFB
EAEBDFFDFAEDFGFADDFAFH
FGECECEAEAEBEGFGFBFAEH

Let's make a Python script for calculating the frequencies of the moon symbols:

from collections import Counter

# Each letter in the string represents a different moon symbol
moon_symbols = """
FGDFEGFBFEEHEEFGEBFHFA
EHFAFBFEEHEEFAEHDFFHFE
EBFGDFFHEADEFIFHDFFGDF
DCEADEEDFBFDFAFBFEEHEE
FAEHDFFHFEEBFHFAEGFEFG
FBFGDFDCEADEEDFBFDFAFB
EAEBDFFDFAEDFGFADDFAFH
FGECECEAEAEBEGFGFBFAEH
""".replace("\n", "")
# Note: we are removing the line breaks so we only get the letters

# Count how many of each symbol, and the total amount
symbol_count = Counter(moon_symbols)
symbol_total = len(moon_symbols)

# Print the percentages of each symbol
for digit, count in symbol_count.most_common():
    percent = count * 100 / symbol_total
    print(f"{digit}: {percent :>6.3f}%")

The percentages we get for the moon symbols are:

F: 30.114%
E: 23.295%
D: 11.932%
A:  9.659%
H:  7.955%
B:  7.386%
G:  6.818%
C:  2.273%
I:  0.568%

The percentages appear to be relatively close to what we found at the text sample. For example, the first four symbols are likely to be (respectively) 0, 1, 2, and 5.

Cracking the code

Now we are going to use the percentages to try matching each symbol to its respective digit. That is, each symbol gets matched with the digit that has the closest percentage to it. This will not necessarily result in the correct matching, but hopefully it should get close enough so we can manually fix the mismatched ones.

First, let's try the following:

F = 0
E = 1
D = 2
A = 5
H = 8
B = 9
G = 4
C = 3
I = 7

If we replace those values on our cipher, we get:

0420140901181104190805
1805090118110518200801
1904200815210708200420
2315211209020509011811
0518200801190805140104
0904202315211209020509
1519200205120405220508
0413131515191404090518

Now let's see if we get anything meaningful by decoding it using letters to numbers (A=01, B=02, ..., Z=26):

DTNIARKDSHE
REIARKERTHA
SDTHOUGHTDT
WOULIBEIARK
ERTHASHENAD
IDTWOULIBEI
OSTBELDEVEH
DMMOOSNDIER

We are almost there! Among the random letters, it is possible to notice some parts of words in the text. Like "ARK", "THOUGHT", or "MOON". That means that we are on the right track, we just need to figure out the digits for the characters that did not turned into anything meaningful.

Comparing the output with their respective digits, we can notice that digits 4 and 9 are those who turned into meaningless sequences. Let's turn the digits back into the symbols that we tried to assign, and see if we can find a pattern:

Tentative match for between the moon symbols and the digits

We can see that the symbols {0, 1, 2, 3, 5} seem to be following the order of the moon phases, starting from the New Moon, then going through the Crescent Moon towards the Full Moon. The symbol on 4 is the one out in the sequence, but previously we have already seen that it did not produce meaningful results. Previously we also found that the 9 did not made sense. Also, when considering all moon phases, it is worth noting that the Waning Moon symbol (☾) is completely absent from our cipher.

We can try rearranging the symbols in the correct order of moon phases, starting from New Moon, while also including the Waning Moon:

Moon symbols and their corresponding digits

If we try again to replace the symbols by their digits, using this new order, we get:

0920190401181109140805
1805040118110518200801
1409200815210708200920
2315211204020504011811
0518200801140805190109
0409202315211204020504
1514200205120905220508
0913131515141909040518

And decoding it using letters to numbers (A=01, B=02, ..., Z=26):

ITSDARKINHE
REDARKERTHA
NITHOUGHTIT
WOULDBEDARK
ERTHANHESAI
DITWOULDBED
ONTBELIEVEH
IMMOONSIDER

Now it completely makes sense! Adding spaces and punctuation, to make it easier to read:

It's dark in here.
Darker than I thought it would be.
Darker than he said it would be.
Don't believe him, moonsider.

We got some lore for the story, it sounds like The Artifact talking to us. The term that stands out is moonsider, at the end.

To the Moon

We still have the text in the page's comments to decipher:

ISZPGUHXFYCCAKQQWUQACBFALHV
FFOIWTWSDACBFALH

The key for decoding it is the term MOONSIDER, that we just found on the previous step. Our text was encoded using the Vigenère cipher, which was already used a couple of times during the ARG.

That cipher works shifting the alphabet positions of each letter by a certain amount, specified by the respective letter of the key. For example, E (the 5th letter of the alphabet) would add 5 to the position of a letter (A would become F). If you go past Z, then you return back to A. If the key is not long enough for covering the whole text, the key just keeps repeating itself.

It is worthy noting that, though there is no hint pointing towards the use of Vigenère, at this point one could just try different keyed ciphers until finding the correct one. And Vigenère, being one of the most common ones, would come soon enough. But if you want a more elegant approach for guessing the cipher, you can try doing some statistical analysis on the ciphertext, which would also hints towards Vigenère.

Using the Vigenère key MOONSIDER to decode our ciphertext, yields:

WELCOMETOMOONSINSDEMOONSIDE
TRAVELTOMOONSIDE

Manually adding spaces and punctuation, to make it readable:

WELCOME TO MOON SINS, DE-MOON-SIDE
TRAVEL TO MOONSIDE

It instructs us to go to MOONSIDE, which is the solution of the puzzle and a reference to the game EarthBound. By adding MOONSIDE to the URL, we get to the next page: https://gopiratesoftware.com/games/Heartbound/CORE_DUMP/MEMORY_ALPHA/CRASH_LOG/BOOT_RECORD/DEEP_THOUGHT/2140425/U4042/HEAPCHK/U4043/RESISTOR_TWISTER/FINAL_MACHINE/MOONSIDE/

The page has an image of The Artifact's eye, and brings us to the final puzzle of Chapter 6: Undoing.

Revision as of 15:28, 5 October 2024 (view source) Luna6667 (talk \| contribs) m (Fixed link to earthbound moonside reference) ← Older edit		Latest revision as of 18:21, 10 October 2024 (view source) Djinnet (talk \| contribs) m (Add the ARG category)
Line 278:		Line 278:

	The page has an image of The Artifact's eye, and brings us to the final puzzle of Chapter 6: [[ARG:Undoing\|Undoing]].		The page has an image of The Artifact's eye, and brings us to the final puzzle of Chapter 6: [[ARG:Undoing\|Undoing]].

			<noinclude>[[Category:ARG]]</noinclude>