Department of Design Futures, Faculty of Design
MARUYAMA Osamu, Associate Professor
One of the factors behind the success of deep learning in recent years is the vectorization of the explanation target. Images and videos are vectors (more precisely, tensors) from the beginning, and even in the field of natural language processing (NLP), words and sentences are vectorized and used in so-called AI systems. For example, attempts have been made to represent a Wikipedia article as a single multidimensional vector. This allows us to measure the similarity between articles. This "vectorization" is expected to continue to expand to a variety of objects in the future.
Under this situation, we are developing methods to vectorize biological data including genome sequences and its application methods [Figures 1]. Figure 1 shows the importance of DNA sequences of length six extracted from a genome based on their vector representations.
Furthermore, we have recently been conducting research to analyze how much emotional information is retained in the vector expressions of tweets posted on SNS [Figures 2 and 3]. We plan to make predictions in the virtual world through vector representations of all related things.
1. Wan Kin Au Yeung, Osamu Maruyama, Hiroyuki Sasaki, A convolutional neural network-based regression model to infer the epigenetic crosstalk responsible for CG methylation patterns, BMC Bioinformatics, 10.1186/s12859-021-04272-8, 22, 341-341, 2021 .
2. Osamu Maruyama Yinuo Li Hiroki Narita Hidehiro Toh Wan Kin Au Yeung Hiroyuki Sasaki, CMIC: predicting DNA methylation inheritance of CpG islands with embedding vectors of variable-length k- mers , BMC Bioinformatics, 10.1186/s12859-022-04916-3, 23, 371, 2022.