# Coding Challenges: DNA and Amino Acids

### Problem 1

Given a string representing a strand of DNA, count the number of nucleotides in a string representing a single strand of DNA. The string will only have characters 'a', 'c', 'g', and 't' (uppercase or lowercase). Return the values as a tuple (A, C, G, T), where each number in the tuple is the number of times it appeared in the DNA strand.

| Example Input|      Expected Return Value   |
| :---:        |             :----:           | 
|"acgtgca"     |  (2, 2, 2, 1) |
|"GTCAatgggaa" |  (4, 1, 4, 2) |

In [2]:
def count_nucleotides(strand):
    num_A = num_C = num_G = num_T = 0
    strand = strand.lower()
    num_A = strand.count('a')
    num_G = strand.count('g')
    num_C = strand.count('c')
    num_T = strand.count('t')
    return num_A, num_C, num_G, num_T

### Problem 2

Given two strings representing strands of DNA, check whether they can be joined, with ends aligned with each other. The strings will only have characters 'a', 'c', 'g', and 't' (uppercase or lowercase). Remember that A pairs with T and C pairs with G. Return a boolean representing whether they can be joined. If the two strands of DNA are not equal length, return False. (Strands can't be reversed, i.e. "atc" and "gat" don't match.)

| Example Input|      Expected Return Value   |
| :---:        |             :----:           | 
|"acgtgca", "TGcacgT"     |  True |
|"ata", "ggg" |  False |
|"atc", "gat" |  False |
|"ata", "tatg" |  False |

In [3]:
def check_match(strand1, strand2):
    matches = {"a": "t", "t": "a", "c": "g", "g": "c"}
    strand_match = ""
    for nucleotide in strand1.lower():
        strand_match += matches[nucleotide]
    return strand_match == strand2.lower()
    return False

### Problem 3

Given a string representing a strand of DNA (containing only the characters 'a', 'c', 'g', and 't', uppercase or lowercase), return a list of strings containing the sequence of amino acids the DNA encodes. If the DNA has "leftover" nucleotides, ignore them. Ignore all nucleotides that come after a stop codon.

| Example Input|      Expected Return Value   |
| :---:        |             :----:           | 
|"gcttgtgggac" | ['ala', 'cys', 'gly'] |
|"catcacg"     | ['his', 'his'] |
|"gcctagattctcccg"| ['ala', 'stp'] |

But first, let's establish some helper data/functions. Here we define the global variables `amino2codon` and `codon2amino`. The dictionary `amino2codon` is initialized for you, and you should not modify it. However, it is cumbersome/inefficient to use this data structure to solve this problem. Instead, you should reorganize the data in a useful way in the function `process_amino2codon()`; this new data structure is saved in `codon2amino`. Use this new data structure to implement `dna_to_amino()`.

In [4]:
amino2codon = {
    "ala": ["GCT", "GCC", "GCA", "GCG" ],
    "leu": ["TTA", "TTG", "CTT", "CTC", "CTA", "CTG"],
    "arg": ["CGT", "CGC", "CGA", "CGG", "AGA", "AGG"],
    "lys": ["AAA", "AAG"],
    "asn": ["AAT", "AAC"],
    "met": ["ATG"],
    "asp": ["GAT", "GAC" ],
    "phe": ["TTT", "TTC"],
    "cys": ["TGT", "TGC" ],
    "pro": ["CCT", "CCC", "CCA", "CCG"],
    "gln": ["CAA", "CAG" ],
    "ser": ["TCT", "TCC", "TCA", "TCG", "AGT", "AGC"],
    "glu": ["GAA", "GAG" ],
    "thr": ["ACT", "ACC", "ACA", "ACG"],
    "gly": ["GGT", "GGC", "GGA", "GGG" ],
    "trp": ["TGG"],
    "his": ["CAT", "CAC" ],
    "tyr": ["TAT", "TAC"],
    "ile": ["ATT", "ATC", "ATA" ],
    "val": ["GTT", "GTC", "GTA", "GTG"],
    "stp": ["TAA", "TGA", "TAG"]
}

def process_amino2codon():
    codon2amino = {}
    for amino, codons in amino2codon.items():
        for codon in codons:
            codon2amino[codon] = amino
    return codon2amino

codon2amino = process_amino2codon()

Complete the function `dna_to_amino()` to complete the challenge. For full credit, you must adequately implement `process_amino2codon()`. You can get partial credit for correctly completing `dna_to_amino()` without implementing `process_amino2codon()`.

In [5]:
def dna_to_amino(strand):
    amino_acids = []
    codon = ""
    for nucleotide in strand:
        codon += nucleotide
        if len(codon) == 3:
            codon = codon.upper()
            amino = codon2amino[codon]
            amino_acids.append(amino)
            if amino == "stp":
                break;
            codon = ""
    return amino_acids

### Problem 4

In a protein, disulfide bonds can form between two cysteine amino acids. These bonds are critical in maintining the folded structure of the protein (see the image below). However, the two cysteines have to be "pointing towards each other" to form the bond. Consequently, there has to be enough space between the cysteines for the protein to fold over and the two amino acids face each other. Let's assume that for two cysteines to form a disulfide bond, they have to be separated by at least 5 amino acids (i.e. 5 amino acids between them).

![title](images/disulfide.png)

Given an array of strings representing the amino acids coded by a DNA strand, check whether at least two simultaneous disulfide bonds can form in the resulting protein. Return True if at least two simultaneous disulfide bonds are possible, and False otherwise. Ignore all nucleotides that come after a stop codon. Each string will be the standard 3-character abbreviation (lowercase) for the amino acid (e.g. "cys" for cysteine, "stp" for stop).

| Example Input|      Expected Return Value   |
| :---:        |             :----:           | 
|["cys", "cys", "trp", "trp", "trp", "trp", "cys", "cys"] | True |
|["cys", "trp", "trp", "trp", "trp", "trp", "cys", "trp", "trp", "trp", "trp", "trp", "cys"]     | False |
|["cys", "cys", "trp", "trp", "stp", "trp", "trp", "cys", "cys"]| False |

In [6]:
def two_disulfides(aminos):
    if "stp" in aminos:
        stop_ind = aminos.index("stp")
        aminos = aminos[:stop_ind]
    cys_ind = []
    for i in range(len(aminos)):
        if aminos[i] == "cys":
            cys_ind.append(i)
    if len(cys_ind) < 4:
        return False
    return cys_ind[-2]-cys_ind[0] >= 5 and cys_ind[-1]-cys_ind[1] >= 5
    return True

# Tests

Don't touch this code. Run this cell to evaluate your code. These are the tests that will be used to score the correctness of your code. If you accidentally mess something up here, let me know. If this cell is found to be intentionally tampered with (to increase your score), your team will automatically be disqualified.

In [7]:
def eval_test(inputs, answers, eval_func):
    pass_test = True
    for i in range(len(inputs)):
        inp = inputs[i]
        ans = answers[i]
        out = eval_func(*inp)
        if out != ans:
            pass_test = False
            print("\tOn input {}, I expected {} but got {}".format(inp, ans, out))
    print("\tPASS" if pass_test else "\tFAIL")
    print()
    return pass_test

score = 0

# Problem 1
inputs = [
    ("acgtgca",),
    ("GTCAatgggaa",),
    ("gggTACcaGattagctatacgacggatc",),
    ("a",),
    ("atgcgtca",)
]
answers = [
    (2, 2, 2, 1),
    (4, 1, 4, 2),
    (8, 6, 8, 6),
    (1, 0, 0, 0),
    (2, 2, 2, 2)
]
print("Problem 1")
score += 7.5 if eval_test(inputs, answers, count_nucleotides) else 0

# Problem 2
inputs = [
    ("acgtgca", "TGcacgT"),
    ("ata", "ggg"),
    ("atc", "gat"),
    ("ata", "tatg"),
    ("atgcgtca", "TACGCAGT")
]
answers = [
    True,
    False,
    False,
    False,
    True
]
print("Problem 2")
score += 7.5 if eval_test(inputs, answers, check_match) else 0

# Problem 3
inputs = [
    ("gcttgtgggac",),
    ("catcacg",),
    ("gcctagattctcccg",),
    ("gctagtagcatgataagatgcgcataactggccgat",),
    ("CATcatCATcatCATcatCAT",)
]
answers = [
    ['ala', 'cys', 'gly'],
    ['his', 'his'],
    ['ala', 'stp'],
    ['ala', 'ser', 'ser', 'met', 'ile', 'arg', 'cys', 'ala', 'stp'],
    ['his', 'his', 'his', 'his', 'his', 'his', 'his']
]
print("Problem 3")
score += 7.5 if eval_test(inputs, answers, dna_to_amino) else 0

# Problem 4
inputs = [
    (["cys", "cys", "trp", "trp", "trp", "trp", "cys", "cys"],),
    (["cys", "trp", "trp", "trp", "trp", "trp", "cys", "trp", "trp", "trp", "trp", "trp", "cys"],),
    (["cys", "cys", "trp", "trp", "stp", "trp", "trp", "cys", "cys"],),
    (["gly", "cys", "his", "ser", "gly", "ile", "glu", "met", "cys", "cys", "cys"],),
    (["gly", "cys", "his", "ser", "cys", "ile", "glu", "met", "cys", "cys", "cys"],),
]
answers = [
    True,
    False,
    False,
    False,
    True
]
print("Problem 4")
score += 7.5 if eval_test(inputs, answers, two_disulfides) else 0

print("total: {}".format(score))

Problem 1
	PASS

Problem 2
	PASS

Problem 3
	PASS

Problem 4
	PASS

total: 30.0
