打印

谷歌算法题

[复制链接]
1264|10
手机看帖
扫描二维码
随时随地手机跟帖
跳转到指定楼层
楼主
gxgclg|  楼主 | 2012-5-22 21:37 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
ST, se, TE, ce, COM
一道关于后缀树的算法题,求高人解答
You will implement the compact representation of the compressed suffix trie ADT for DNA analyses.
A template of the compressed suffix trie class is shown as follows:
public class CompressedSuffixTrie
{
/** You need to define your data structures for the compressed trie */
/** Constructor */
public CompressedSuffixTrie( String f ) // Create a compressed suffix trie from file f
{ }
/** Method for finding the first occurrence of a pattern s in the DNA sequence */
public int findString( String s )
{ }
/** Method for finding the longest common subsequence of two DNA sequences stored
in two text files f1 and f2 */
public static String findLongestCommonSubsequence(String f1, String f2)
{ }
}
The data structures for the compressed suffix trie are not given in the above template. You need to define them yourself. You may introduce any helper methods to facilitate the implementation of these two methods.
The constructor creates a compact representation of the compressed suffix trie from an input text file f that stores a DNA sequence. All the characters of the DNA sequence are A, C, G and T. The findString(s) method has only one parameter: a pattern s. If s appears in the DNA sequence, findString(s) will return the starting index of the first occurrence of s in the DNA sequence. Otherwise, it will return –1. For example, if the DNA sequence is AAACAACTTCGTAAGTATA, then findString(“CAACT”) will return 3 and findString(“GAAG”) will return –1. Note that the index of the first character of the DNA sequence is 0.
Warning: If your findString(s) method is slower than O(|s|) (|s| is the length of s), you will get 0 mark for it.
The findLongestCommonSubsequence(String f1, String f2) returns the longest common subsequence of two DNA sequences stored in the text files f1 and f2. For simplicity, you may assume that each file contains at most 1000 DNA characters. When your program reads a DNA sequence from a file, it needs to ignore all non-DNA characters such as the newline character. Notice that this method does not need to use any compressed suffix trie. The running time of your method findLongestCommonSubsequence(f1, f2) is required to be at most O(mn) , where m and n are the sizes of f1 and f2, respectively. Any method with a higher time complexity will be given 0 mark.
You need to give the running time analyses of all the methods in terms of the Big O notation. Include your running time analyses in the source file of the CompressedSuffixTrie class and comment out them.

相关帖子

沙发
yybj| | 2012-5-22 21:46 | 只看该作者
哪位大侠先把这道题翻译成中文

使用特权

评论回复
板凳
lzh8430| | 2012-5-22 22:11 | 只看该作者
好无助啊  围观

使用特权

评论回复
地板
xsgy123| | 2012-5-23 14:30 | 只看该作者
英文水平没达标,看的很吃力

使用特权

评论回复
5
火箭球迷| | 2012-5-24 23:05 | 只看该作者
很有难度的一道题

使用特权

评论回复
6
无冕之王| | 2012-5-25 15:39 | 只看该作者
估计一般人都没时间看LZ这道英文题

使用特权

评论回复
7
hsbjb| | 2012-5-25 16:01 | 只看该作者
这题目难度还是不小的

使用特权

评论回复
8
dfsa| | 2012-5-25 23:16 | 只看该作者
这道题仔细看看,感觉难道没有那么大

使用特权

评论回复
9
yybj| | 2012-5-26 23:35 | 只看该作者
还是没想出来怎么做

使用特权

评论回复
10
火箭球迷| | 2012-5-26 23:43 | 只看该作者
LZ可以先自己把这道题翻译成中文,再贴出来,看起来更方便一点

使用特权

评论回复
11
sinadz| | 2012-5-28 17:11 | 只看该作者
还是没有高手结题啊

使用特权

评论回复
发新帖 我要提问
您需要登录后才可以回帖 登录 | 注册

本版积分规则

177

主题

1653

帖子

1

粉丝