Panini Linguistics Olympiad: Genes from Space

Problem:
Alien Protein Codes

Dr. Muzabique figured out how to manipulate six alien proteins to do three functions: construct, cut, and pack. The alien genetic code used to manipulate the proteins was found to have 6 "alphabet letters": A, T, G, C, D, N. Only he knew the top-secret algorithm, and two weeks ago, he was mysteriously found dead in his apartment.

The top-secret project is now in a fund crunch but the government decides to give it one more chance if the algorithm can be decoded. They send you the only page that was recovered from Dr. Muzabique's notebook, with one set of input instructions and output genetic codes, and ask you to decode the algorithm:

"Today was a great breakthrough! I found that large and small proteins inherently behave in different ways. The way they operate under different cellular functions..."

Input Output
Celebi, construct Articuno CNDATACACCNDTGTATTCGGGATGCNDTGCCCACCCCNNNCNDCTTTTCCACTNANCND
Articuno, construct and cut Terrakion CNDTGTATTCGGGATGCNDACTGCAGGGACTCCNDGCATGACATCNNNNNNCNDTGCCCACCC
Terrakion, construct and pack Shaymin CNDACTGCAGGGACTCCNDCAGGCTCNDGCATGACATCNNNNANCNDCTTTTCCACTNNNCND
Cressela, construct and cut and pack Azelf CNDCCATATGATGCGACNDGTTGATCNDGCATGACATCNNNNANCNDTGCCCACCCCNANCND CTTTTCCACTNNNCND
Celebi, cut Cressela CNDATACACCNDCCATATGATGCGACNDGCATGACATCNANNNNCNDCTTTTCCACTNANCND
Shaymin, pack and construct Azelf CNDCAGGCTCNDGTTGATCNDTGCCCACCCCNANCND
Celebi, construct and cut and pack Cressela CNDATACACCNDCCATATGATGCGACND
Terrakion, pack Articuno CNDACTGCAGGGACTCCNDTGTATTCGGGATGCNDCTTTTCCACTNNNCND
Assignment:

Preserve the legacy of Dr. Muzabique! Explain how the algorithm converts input commands into output genetic codes.

The Request

Could someone please check to see if my solution below is correct? Thank you!

My Solution

Sections of the gene sequence are separated by the codon CND, like below:

enter image description here

The underlined parts are the proteins involved. The first underlined part is the protein doing the action(the first protein mentioned) while the second underlined part is the protein being acted upon(the second protein mentioned). I've color-coded the underlined parts to differentiate between proteins.

The other sections are the actions done. As hinted in Dr. Muzabique's notebook, the length of the underlined protein sequences affects the action sequences. If the first protein(the acting protein) is long, then the action sequences will represent the actions in the command. If the acting protein is short, then the action sequences will represent the actions not in the command.

The action sequences also seem to end in either NNN or NAN, or a sequence of the two. I'll call these sequences "penguins"(because why not) and the other part of an action sequence the polar bears. This part seems to depend on the size of the proteins. The third gene output gives us the key between the actions and the polar bears. For the action sequence for constructing(the polar bear is GCATGACATC), it seems that it has N_NN_N as the penguin. It also seems like the blanks represent the first and second protein lengths, in order, where it is A if the protein is short and N if the protein is long. For the action sequence for cutting(polar bear: TGCCCACCCC) the penguin is N_N, and depends on the second protein: if it is short then the blank is A, if it is long then the blank is N. For packing(polar bear: CTTTTCCACT) it is like the cutting penguin except it depends on the first protein.

Problem images: (1), (2)



* This article was originally published here

Comments