Question

Formatted question description: https://leetcode.ca/all/187.html

 187	Repeated DNA Sequences

 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG".
 When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

 Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

 Example:

 Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

 Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Algorithm

Using HashSet can not have the characteristics of duplicate items.

Code

Java

public class Repeated_DNA_Sequences {

    public static void main(String[] args) {
        Repeated_DNA_Sequences out = new Repeated_DNA_Sequences();
        Solution s = out.new Solution();

        System.out.println(s.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"));
    }

    class Solution_optimize {
        public List<String> findRepeatedDnaSequences(String s) {
            Set<Integer> words = new HashSet<>();

            // need this to remove duplicates
            // Input "AAAAAAAAAAAAA", result ["AAAAAAAAAA"]
            // to avoid result ["AAAAAAAAAA","AAAAAAAAAA","AAAAAAAAAA"]
            Set<Integer> repteated = new HashSet<>();
            List<String> result = new ArrayList<>();

            char[] map = new char[26];
            //map['A' - 'A'] = 0;
            map['C' - 'A'] = 1;
            map['G' - 'A'] = 2;
            map['T' - 'A'] = 3;

            for(int i = 0; i < s.length() - 9; i++) {
                int v = 0;
                for(int j = i; j < i + 10; j++) {
                    v <<= 2;
                    v |= map[s.charAt(j) - 'A'];
                }
                if(words.contains(v) && !repteated.contains(v)) {
                    repteated.add(v);
                    result.add(s.substring(i, i + 10));
                }

                words.add(v); // re-add, if already contains
            }
            return result;
        }
    }

    class Solution {
        HashSet<String> existed = new HashSet<>();
        HashSet<String> result = new HashSet<>();

        public List<String> findRepeatedDnaSequences(String s) {

            if (s.length() <= 10) {
                return new ArrayList<String>();
            }

            for (int i = 0; i < 10; i++) {
                for (int j = i; j + 9 < s.length(); j += 10) {

                    String one = s.substring(j, j + 10);

                    if (existed.contains(one)) {
                        result.add(one);
                    }

                    existed.add(one);
                }
            }

            return new ArrayList<String>(result);
        }

    }
}

All Problems

All Solutions