단백질 시퀀스와 가중치 스트링에 대한 탐색 알고리즘

단백질 시퀀스와 가중치 스트링에 대한 탐색 알고리즘

ㆍ 저자명: 김성권,Kim. Sung-Kwon
ㆍ 간행물명: 정보과학회논문지. Journal of KIISE. 시스템 및 이론
ㆍ 권/호정보: 2002년|29권 8호|pp.456-462 (7 pages)
ㆍ 발행정보: 한국정보과학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

단백질 시퀀스처럼 가중치를 가지는 스트링에 대한 탐색 알고리즘을 개발한다. ${sum}$를 알파벳이라 하고 모든 $a{in}{sum}$에 대해서 무게 ${mu}(a)$가 주어진다고 하자. 스트링 $A=a_1a_2…a_n; 에서 (단, 모든 ai{in}{sum})$, 서브스트링 $A(i.j)=a_ia_{i+1}…a_j$로 정의하면, 이것의 무게는 ${in}(A(i.j))={in}(a_i)+{in}(a_i+1)+…+{in}(a_j)$가 된다. 다루고자하는 문제는 스트링 A를 사전 처리하여 탐색 자료구조를 만드는데, 이 자료구조는 나중에 질문 무게 M이 주어진 경우, $M={in}(A(i,j))$인 서브스트링 A(i,j)가 있는가 라는 질문에 응답하는데 사용된다. 본 논문에서는 기존의 결과를 향상시키는 알고리즘을 제시한다. 기존의 알고리즘의 경우 O(n) 만큼의 메모리를 사용하는 탐색 자료구조를 이용하여 $0(frac{nlog;logn}{log; n})$ 시간에 질문응답을 하였으나, 본 논문의 알고리즘은 질문 응답시간은 그대로 유지하면서 메모리만 $0(frac{n}{log; n})$으로 줄인다.

기타언어초록

We are developing searching algorithms for weighted strings such as protein sequences. Let${sum}$ be an alphabet and for each $a{in}{sum}$ its weight ${mu}(a)$ is given. Given a string $A=a_1a_2…a_n; with each ai{in}{sum}$, a substring<$A(i.j)=a_ia_{i+1}…a_j$ has weight ${in}(A(i.j))={in}(a_i)+{in}(a_i+1)+…+{in}(a_j)$.The problem we are dealing with is to preprocess A to build a searching structure, and later, given a query weight M, the structure is used to answer the question of whether there is a substring A(i,j) such that$M={in}(A(i,j))$.In this paper an algorithm that improves over the previous result will be presented. The previously best known algorithm answers a query in $0(frac{nlog;logn}{log; n})$time using a searching structure that requires O(n) amount of memory. Our algorithm reduces the memory requirement to $0(frac{n}{log; n})$ while achieving the same query answer time.

키워드

단백질 시퀀스 가중치 스트링 탐색 알고리즘 Protein sequence weighted string searching algorithm

다운URL