Synthesis
- 인공지능 음악 생성 남주한 교수님 2022.12.07
- Speech processing tutorials, 2022 2022.08.22
- INTERSPEECH 2019 Tutorials 2022.08.22
- Naver NEST 음성인식 2020.04.14
- INTERSPEECH 2018 challenges 2019.10.31
- TTS: text-to-speech, HTS 2019.09.25
- PRAAT 2019.09.23
- Tacotron 2019.08.04
- SMARTEAR 2019.08.03
- Awesome Speech 2019.08.03
인공지능 음악 생성 남주한 교수님
Speech processing tutorials, 2022
https://ratsgo.github.io/speechbook/
Home
articles about speech recognition
ratsgo.github.io
https://tonywangx.github.io/slide.html
Talk & slides — HomePage-WangXin documentation
Talk & slides In most cases, I cannot directly share audio samples. Some samples can be found through the link in the PDF. Talk 2022-MAY ICASSP 2022 short course: neural vocoder This talk briefly summarizes a few representative neural vocoders. For a more
tonywangx.github.io
https://marg-machine-listening-tutorial.mystrikingly.com/
MACHINE LISTENING TUTORIAL (MARG) on Strikingly
marg-machine-listening-tutorial.mystrikingly.com
INTERSPEECH 2019 Tutorials
https://github.com/espnet/interspeech2019-tutorial
GitHub - espnet/interspeech2019-tutorial: INTERSPEECH 2019 Tutorial Materials
INTERSPEECH 2019 Tutorial Materials. Contribute to espnet/interspeech2019-tutorial development by creating an account on GitHub.
github.com
Jupyter Notebook Viewer
nbviewer.org
Jupyter Notebook Viewer
nbviewer.org
Naver NEST 음성인식
https://www.bloter.net/archives/377679
네이버, 새 음성인식 기술 ‘NEST’ 공개
일반 사용자들은 홈페이지에서 ‘NEST’ 기술을 무료로 체험해볼 수 있다.
www.bloter.net
http://www.businesskorea.co.kr/news/articleView.html?idxno=44113
네이버, 새로운 음성인식 기술 ‘NEST’ 공개 - Businesskorea
[비지니스코리아=김은진 기자] 네이버가 세계적인 수준의 자체 음성 기술 연구 성과를 바탕으로, 한 단계 진화한 음성인식 엔진 ‘NEST’(Neural End-to-end Speech Transcriber)를 공개했다.‘NEST’는 제한된 ...
www.businesskorea.co.kr
http://www.mediapen.com/news/view/516943
네이버, 새 음성인식 기술 'NEST' 공개
[미디어펜=권가림 기자] 네이버가 새 음성인식 기술을 공개했다. 네이버는 13일 방송뉴스 등 기존 네이버 서비스에 활용돼 왔던 음성인식 엔진 ‘NEST’를 일반인들에게 공개했다. 네이버는 "NEST는 제한된 데이터 학습만으로도 복잡하고 다양한 장문의 음성 표현을 정확하게 인식, 텍스트로 변환할 수 있는 기술"이라며 "대량의 정제된 데이터를
www.mediapen.com
http://www.epnc.co.kr/news/articleView.html?idxno=95425
말하는대로, 똑똑하게 받아적는다. 네이버 'NEST' - 테크월드
[테크월드=이건한 기자] 네이버가 자체 기술 연구 성과를 바탕으로 개선된 음성인식 엔진 \'NEST(Neural End-to-end Speech Transcriber)\'를 공개했다. NEST의 핵심은 제한된 데이터만으로 복잡하고 다양한 장문...
www.epnc.co.kr
INTERSPEECH 2018 challenges
https://interspeech2019.org/program/special_sessions_and_challenges/
INTERSPEECH 2019 - Special Sessions & Challenges
Voice quality characterization for clinical voice assessment: Voice production, acoustics, and auditory perception Go to schedule The assessment of voice quality is relevant to the clinical care of disordered voices. It contributes to the selection and opt
interspeech2019.org
The Interspeech 2019 Computational Paralinguistics Challenge (ComParE)
Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity
Interspeech ComParE is an open Challenge dealing with states and traits of speakers as manifested in their speech signal’s properties. In this 11th edition, we introduce four new tasks and Sub-Challenges:
- Styrian Dialects Recognition in Spoken Language,
- Continuous Sleepiness Estimation in Speech,
- Baby Sound Recognition,
- Orca Activity Detection.
Sub-hallenges allow contributors to find their own features with their own machine learning algorithm. However, a standard feature set and tools are provided that may be used. Participants have five trials on the test set per Sub-Challenge. Participation has to be accompanied by a paper presenting the results that undergoes the Interspeech peer-review.
Contributions using the provided or equivalent data are sought for (but not limited to):
- Participation in a Sub-Challenge
- Contributions focusing centered around the Challenge topics
Results of the Challenge and Prizes will be presented at Interspeech 2019 in Graz, Austria.
Please visit: http://www.compare.openaudio.eu/compare2019/
Organisers:
Björn Schuller (U Augsburg, Germany / Imperial College, UK / audEERING)
Anton Batliner (U Augsburg, Germany)
Christian Bergler (FAU, Germany)
Florian Pokorny (MU Graz, Austria)
Jarek Krajewski (U Wuppertal / RUAS Cologne, Germany)
Meg Cychosz (UC Berkeley, USA)
2019 – compare.openaudio.eu
Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity Interspeech 2019 Computational Paralinguistics Challenge (ComParE) The Interspeech 2019 Computational Paralinguistics ChallengE (ComParE) is an open Challenge dealing with states and trai
www.compare.openaudio.eu
The VOiCES from a Distance Challenge
The VOiCES from a distance challenge will be focused on benchmarking and further improving state-of-the-art technologies in the area of speaker recognition and automatic speech recognition (ASR) for far-field speech. The challenge is based on the recently released corpus Voices Obscured on Complex Environmental Settings (VOiCES), were noisy speech was recorded in real reverberant rooms with multiple microphones. Noise sources included babble, music, or television. The challenge will have two tracks for speaker recognition and ASR:
Fixed System - Training data is limited to specific datasets
Open System - Participants can use any external datasets they have access to (private or public)
The participating teams will get early access to the VOiCES phase II data, which will form the evaluation set for the challenge. The special session will be dedicated to the discussion of applied technology, performance thereof and any issues highlighted as a result of the challenge.
For more information visit: https://voices18.github.io/Interspeech2019-Special-Session/
Organizers:
Aaron Lawson (SRI International)
Colleen Richey (SRI International)
Maria Alejandra Barros (Lab41, In-Q-Tel)
Mahesh Kumar Nandwana (SRI International)
Julien van Hout (SRI International)
Description
Interspeech 2019 Special Session
voices18.github.io
The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge
The INTERSPEECH 2019 special session on Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof) will accelerate anti-spoofing research for automatic speaker verification (ASV).
The first challenge, ASVspoof 2015, focused on speech synthesis and voice conversion spoofing attacks. The second challenge, ASVspoof 2017, focused on replay spoofing attacks. ASVspoof 2019, the third in a series of such challenges will be the first challenge with a broad focus on all three types of spoofing attacks. In a continuation of 2015 and 2017 editions, ASVspoof 2019 promotes the development of generalised spoofing countermeasures, namely countermeasures that perform reliably in the face of unpredictable variation in attack types and algorithms.
ASVspoof 2019 has two sub-challenges:
- Logical access and speech synthesis/voice conversion attack:
The data used for ASVspoof 2015 included spoofing attacks generated with text-to-peech (TTS) and voice conversion (VC) attacks generated with the state-of-the-art systems at that time. Since then, considerable progress has been reported by both TTS and VC communities. The quality of synthetic speech produced with today’s best technology is now perceptually indistinguishable from bona fide speech. Since these technologies can be used to project convincing speech signals over the telephone, they pose substantial threats to the reliability of ASV. This scenario is referred to as logical access. The assessment of countermeasures, namely automatic systems that can detect non bona fide, spoofed speech produced with the latest TTS and VC technologies is therefore needed urgently. - Physical access and replay attack:
The ASVspoof 2017 database included various types of replayed audio files recorded at several places via many different devices. Progress in the development of countermeasures for replay detection has been rapid, with substantial improvements in performance being reported each year. The 2019 edition of ASVspoof features a distinct physical access and replay attack condition in the form of a far more controlled evaluation setup than that of the 2017 condition. The physical access scenario is relevant not just to ASV, but also to the emerging problem of fake audio detection that is faced in a host of additional applications including voice interaction and authentication with smart objects (e.g. smart-speakers and voice-driven assistants).
In addition, ASVspoof 2019 will adopt a new t-DCF evaluation metric that reflects the impact of spoofing and of countermeasures on ASV performance.
For more details, please see the challenge site at http://www.asvspoof.org
Organizers (*)
Junichi Yamagishi (NII, Japan & Univ. of Edinburgh, UK)
Massimiliano Todisco (EURECOM, France)
Md Sahidullah (Inria, France)
Héctor Delgado (EURECOM, France)
Xin Wang (National Institute of Informatics, Japan)
Nicholas Evans (EURECOM, France)
Tomi Kinnunen (University of Eastern Finland, Finland)
Kong Aik Lee (NEC, JAPAN)
Ville Vestman (University of Eastern Finland, Finland)
(*) Equal contribution
| ASVspoof
ASVspoof 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge Future horizons in spoofed/fake audio detection 17th July: Release of ASVspoof 2019 real PA database. In this public release, we extended the simulated data used in the cha
www.asvspoof.org
The Zero Resource Speech Challenge 2019: TTS without T
Typical speech synthesis systems are built with an annotated corpus made of audio from a target voice plus text (and/or aligned phonetic labels). Obtaining such an annotated corpus is costly and not scalable considering the thousands of 'low resource' languages lacking in linguistic expertise or without a reliable orthography.
The ZeroSpeech 2019 challenge addresses this problem by proposing to build a speech synthesizer without any text or phonetic labels, hence, 'TTS without T' (text-to-speech without text). In this challenge, similarly, we provide raw audio for the target voice(s) in an unknown language, but no alignment, text or labels.
Participants will have to rely on automatically discovered subword units and align them to the voice recording in a way that works best for the purpose of synthesizing novel utterances from novel speakers. The task extends previous challenge editions with the requirement to synthesize speech, which provides an additional objective, thereby helping the discovery of acoustic units that are linguistically useful.
For more information please visit: http://www.zerospeech.com/2019/
Organizers:
Ewan Dunbar (Laboratoire de Linguistique Formelle, Cognitive Machine Learning [CoML])
Emmanuel Dupoux (Cognitive Machine Learning [CoML], Facebook A.I. Research)
Robin Algayres (Cognitive Machine Learning [CoML])
Sakriani Sakti (Nara Institute of Science and Technology, RIKEN Center for Advanced Intelligence Project)
Xuan-Nga Cao (Cognitive Machine Learning [CoML])
Mathieu Bernard (Cognitive Machine Learning [CoML])
Julien Karadayi (Cognitive Machine Learning [CoML])
Juan Benjumea (Cognitive Machine Learning [CoML])
Lucas Ondel (Department of Computer Graphics and Multimedia, Brno University of Technology)
Alan W. Black (Language Technologies Institute, Carnegie Mellon University)
Laurent Besacier (Laboratoire d’Informatique de Grenoble, équipe GETALP)
ZeroSpeech 2019: TTS without T — The Zero Speech Challenge documentation
Task and intended goal Young children learn to talk long before they learn to read and write. They can conduct a dialogue and produce novel sentences, without being trained on an annotated corpus of speech and text or aligned phonetic symbols. Presumably,
www.zerospeech.com
Spoken Language Processing for Children's Speech
This special session aims to bring together researchers and practitioners from academia and industry working on the challenging task of processing spoken language produced by children.
While recent years have seen dramatic advances in the performance of a wide range of speech processing technologies (such as automatic speech recognition, speaker identification, speech-to-speech machine translation, sentiment analysis, etc.), the performance of these systems often degrades substantially when they are applied to spoken language produced by children. This is partly due to a lack of large-scale data sets containing examples of children's spoken language that can be used to train models but also because children's speech differs from adult speech at many levels, including acoustic, prosodic, lexical, morphosyntactic, and pragmatic.
We envision that this session will bring together researchers working in the field of processing children's spoken language for a variety of downstream applications to share their experiences about what approaches work best for this challenging population.
For more information please visit: https://sites.google.com/view/wocci/home/interspeech-2019-special-session
Organizers:
Keelan Evanini (Educational Testing Service)
Maryam Najafian (MIT)
Saeid Safavi (University of Surrey)
Kay Berkling (Duale Hochschule Baden-Württemberg)
Interspeech 2019 Special Session
Spoken Language Processing for Children's Speech Special Session Motivation This special session aims to bring together researchers and practitioners from academia and industry working on the challenging task of processing spoken language produced by child
sites.google.com
Voice quality characterization for clinical voice assessment: Voice production, acoustics, and auditory perception
The assessment of voice quality is relevant to the clinical care of disordered voices. It contributes to the selection and optimization of clinical treatment as well as to the evaluation of the treatment outcome. Levels of description of voice quality include the biomechanics of the vocal folds and their kinematics, temporal and spectral acoustic features, as well as the auditory scoring of hoarseness, hyper- and hypo-functionality, creakiness, diplophonia, harshness, etc. Broad and fuzzy definitions of terms regarding voice quality are in use, which impede scientific and clinical communication.
The aim of the special session is to contribute to the improvement of the clinical assessment of voice quality via a translational approach, which focuses on quantifying and explaining relationships between several levels of description. The goal is to objectify voice quality via (i) the analysis and simulation of vocal fold vibrations by means of high-speed videolaryngoscopy in combination with kinematic or mechanical modelling, (ii) the synthesis of disordered voices joint with auditory experimentation involving disordered voice stimuli, as well as (iii) the statistical analysis and automatic classification of distinct types of voice quality via video and/or audio features.
Organizers:
Philipp Aichinger (philipp.aichinger@meduniwien.ac.at)
Abeer Alwan (alwan@ee.ucla.edu)
Carlo Drioli (carlo.drioli@uniud.it)
Jody Kreiman (jkreiman@ucla.edu)
Jean Schoentgen (jschoent@ulb.ac.be)
Dynamics of Emotional Speech Exchanges in Multimodal Communication
Research devoted to understanding the relationship between verbal and nonverbal communication modes, and investigating the perceptual and cognitive processes involved in the coding/decoding of emotional states is particularly relevant in the fields of Human-Human and Human-Computer Interaction.
When it comes to speech, it is unmistakable that the same linguistic expression may be uttered for teasing, challenging, stressing, supporting, inquiring, answering or as expressing an authentic doubt. The appropriate continuance of the interaction depends on detecting the addresser’s mood.
To progress towards a better understanding of such interactional facets, more accurate solutions are needed for defining emotional and empathic contents underpinning daily interactional exchanges, developing signal processing algorithms able to capture emotional features from multimodal social signals and building mathematical models integrating emotional behaviour in interaction strategies.
The themes of this special session are multidisciplinary in nature and closely connected in their final aims to identify features from realistic dynamics of emotional speech exchanges. Of particular interest are analyses of visual, textual and audio information and corresponding computational efforts to automatically detect and interpret their semantic and pragmatic contents.
A special issue of the Journal Computer Speech and Language is foreseen as an outcome of this special session.
Details can be found on the web page: http://www.empathic-project.eu/index.php/ssinterspeech2019/
Organizers:
ANNA ESPOSITO (iiass.annaesp@tin.it; anna.esposito@unicampania.it)
MARIA INÉS TORRES (manes.torres@ehu.eus)
OLGA GORDEEVA (olga.gordeeva@acapela-group.com)
RAQUEL JUSTO (raquel.justo@ehu.eus)
ZORAIDA CALLEJAS CARRIÓN (zoraida@ugr.es)
KRISTIINA JOKINEN (kristiina.jokinen@aist.go.jp
GENNARO CORDASCO (gennaro.cordasco@unicampania.it)
BJIOERN SCHULLER (bjoern.schuller@imperial.ac.uk)
CARL VOGEL (vogel@cs.tcd.ie)
ALESSANDRO VINCIARELLI (Alessandro.Vinciarelli@glasgow.ac.uk)
GERARD CHOLLET (gerard.chollet@telecom-paristech.fr)
NEIL GLACKIN (neil.glackin@intelligentvoice.com)
ssinterspeech2019
Interspeech 2019 Special Session: Dynamics of Emotional Speech Exchanges in Multimodal CommunicationGraz, Austria, Sep. 15-19, 2019 Special Session Format:The format of the special session allows
www.empathic-project.eu
Privacy in Speech and Audio Interfaces
While service quality of speech and audio interfaces can be improved using interconnected devices and cloud services, it simultaneously increases the likelihood and impact of threats to the users’ privacy. This special session is focused on understanding the privacy issues that appear in speech and audio interfaces, as well as on the methods we have for retaining a level of privacy which is appropriate for the user.
Contributions to this session are invited especially for
Privacy-preserving processing methods for speech and audio
De-identification and obfuscation for speech and audio
User-interface design for privacy in speech and audio
Studies and resources on the experience and perception of privacy in speech and audio signals
Detection of attacks on privacy in speech and audio interfaces
More information at http://speechprivacy2019.aalto.fi
Organizers:
Tom Bäckström (tom.backstrom@aalto.fi)
Stephan Sigg (stephan.sigg@aalto.fi)
Rainer Martin (rainer.martin@ruhr-uni-bochum.de)
https://speechprivacy2019.aalto.fi/
speechprivacy2019.aalto.fi
Speech Technologies for Code-Switching in Multilingual Communities
Speech technologies exist for many high resource languages, and attempts are being made to reach the next billion users by building resources and systems for many more languages. Multilingual communities pose many challenges for the design and development of speech processing systems. One of these challenges is code-switching, which is the switching of two or more languages at the conversation, utterance and sometimes even word level.
Code-witching is found in text in social media, instant messaging and blogs in multilingual communities in addition to conversational speech. Monolingual natural language and speech systems fail when they encounter code-switched speech and text. There is a lack of data and linguistic resources for code-switched speech and text. Code-switching provides various interesting challenges to the speech community, such as language modeling for mixed languages, acoustic modeling of mixed language speech, pronunciation modeling and language identification from speech.
The third edition of the special session on speech technologies for code-switching will span these topics, in addition to discussions about data and resources for building code-switched systems.
Web site: https://www.microsoft.com/en-us/research/event/interspeech-2019-special-session-speech-techologies-for-code-switching-in-multilingual-communities/
Organizing Committee:
Kalika Bali (Researcher, Microsoft Research India: kalikab@microsoft.com)
Alan W Black (Professor, Language Technologies Institute, Carnegie Mellon University, USA: awb@cs.cmu.edu)
Julia Hirschberg (Professor, Computer Science Department, Columbia University, USA: julia@cs.columbia.edu)
Sunayana Sitaram (Senior Applied Scientist, Microsoft Research India: sunayana.sitaram@microsoft.com)
Thamar Solorio (Associate Professor, Department of Computer Science, University of Houston, USA: solorio@cs.uh.edu)
Interspeech 2019 Special Session: Speech Techologies for Code-switching in Multilingual Communities - Microsoft Research
Interspeech 2019 Special Session: Speech Techologies for Code-switching in Multilingual Communities
www.microsoft.com
The Second DIHARD Speech Diarization Challenge (DIHARD II)
The Second DIHARD Speech Diarization Challenge (DIHARD II) is an open challenge of speech diarization in challenging acoustic environments including meeting speech, child language acquisition data, speech in restaurants, and web video. Whereas DIHARD I focused exclusively on diarization from single channel recordings, in conjunction with the organizers of the CHiME challenges, DIHARD II will also include tracks focusing on diarization from multichannel recordings of dinner parties.
Submissions are invited from both academia and industry and may use any dataset (publicly available or proprietary) subject to the challenge rules. Additionally, a development set, which may be used for training, and a baseline system will be provided. Performance will be evaluated using diarization error rate (DER) and a modified version of the Jaccard index. If you are interested and wish to be kept informed, please send an email to the organizers at dihardchallenge@gmail.com and visit the website: https://coml.lscp.ens.fr/dihard/.
Organizers:
Neville Ryant (Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA)
Alejandrina Cristia (Laboratoire de Sciences Cognitives et Psycholinguistique, ENS, Paris, France)
Kenneth Church (Baidu Research, Sunnyvale, CA, USA)
Christopher Cieri (Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA)
Jun Du (University of Science and Technology of China, Hefei, China)
Sriram Ganapathy (Electrical Engineering Department, Indian Institute of Science, Bangalore, India)
Mark Liberman (Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA)
DIHARD Challenge 2019
Kenneth Church (Baidu Research, Sunnyvale, CA, USA) Kenneth Church has worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation,lexi
coml.lscp.ens.fr
TTS: text-to-speech, HTS
HTS: HMM/DNN-based Speech Synthesis System
Home - HMM/DNN-based speech synthesis system (HTS)
Welcome! † The HMM/DNN-based Speech Synthesis System (HTS) has been developed by the HTS working group and others (see Who we are and Acknowledgments). The training part of HTS has been implemented as a modified version of HTK and released as a form of pat
hts.sp.nitech.ac.jp
https://github.com/llSourcell/Neural_Network_Voices
llSourcell/Neural_Network_Voices
This is the code for "Neural Network Voices" by Siraj Raval on Youtube - llSourcell/Neural_Network_Voices
github.com
Github, Slide, Speech, Synthesis, Tacotron, Tutorial
PRAAT
https://pypi.org/project/praat-parselmouth/
praat-parselmouth
Praat in Python, the Pythonic way
pypi.org
https://github.com/georgiee/lip-sync-lpc
georgiee/lip-sync-lpc
LPC, vowels, formants. A repo to save my research on this topic - georgiee/lip-sync-lpc
github.com
https://github.com/YannickJadoul/Parselmouth
YannickJadoul/Parselmouth
Praat in Python, the Pythonic way. Contribute to YannickJadoul/Parselmouth development by creating an account on GitHub.
github.com
https://parselmouth.readthedocs.io/en/stable/
Parselmouth – Praat in Python, the Pythonic way — Parselmouth 0.3.3 documentation
Though other attempts have been made at porting functionality from Praat to Python, Parselmouth is unique in its aim to provide a complete and Pythonic interface to the internal Praat code. While other projects either wrap Praat’s scripting language or rei
parselmouth.readthedocs.io
Tacotron
Tacotron: Towards End-to-End Speech Synthesis
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design c
arxiv.org
ppt21.com/pb/pb.php?id=humor&no=314206
딥러닝 음성합성 기술
http://carpedm20.github.io/tacotron/링크 들어가시면 각종 문장을 손석희 503 문재인대통령 버전으로 읽는 합성 음성을 들을 수 있습니다조금 부족한 감이 있지만 이정도면 엔간한 성대모사급인데요 덜덜이거 실화냐 손석희 버전이 그럴싸합니다 크크
ppt21.com
Multi-Speaker Tacotron
Samples (Training data = Son: 15+ hours, Park: 5+ hours, Moon: 2+ hours) Click if you can't hear any sound 제너러티브 어드벌서리얼 네트워크와 베리에셔널 오토 인코더가 핫하다. Seo Son Park 오스트랄로피테쿠스 아파렌시스는 멸종된 사람족 종으로, 현재에는 뼈 화석이 발견되어 있다. Seo Son Park Moon 저는 데브시스터즈에서 머신러닝 엔지니어로 일하고 있는
carpedm20.github.io
Audio samples related to Tacotron, an end-to-end speech synthesis system by Google.
google.github.io
www.modulabs.co.kr/DeepLAB_Paper/19478
DeepLAB논문반 - [논문반] Tacotron : Towards End-to-End Speech Synthesis
오늘 발표할 논문은 Tacotron : Towards End-to-End Speech Synthesis 입니다. 모두의연구소 DeepLAB 논문반에서는 이번주와 다음주에 걸쳐 음성합성 논문 특집을 진행합니다. Tacotron 1,2 와 Deep Voice 1,2,3 에 대해서 2주간 심도있게 살펴보며 제가 첫 논문을 발표하게 되었습니다. 논문링크 : https://arxiv.org/pdf/1703.10135.pdf 발표자료 : [김승일201804
www.modulabs.co.kr
고등학생이 해석한 Tacotron2
Tacotron2
medium.com
컴공학부생이 읽어보는 논문감상 - Tacotron : Toward End-To-End Speech Synthesis (1)
Tacotron : Toward End-To-End Speech Synthesis 을 읽고 쓰는 리뷰아닌 감상문 ※주의사항. 필자는 논문을 많이 읽어본 적이 없으며 전문지식 또한 그렇게 많지 않은 편인 1학년 학부생입니다. 흥미위주로 읽고..
snowapril7758.tistory.com
nblog.syszone.co.kr/archives/9416
딥러닝 음성합성 multi-speaker-tacotron(tacotron+deepvoice)설치 및 사용법 – 시스존
딥러닝 음성합성 multi-speaker-tacotron(tacotron+deepvoice)설치 및 사용법 by 서진우 · Published 2018년 10월 17일 · Updated 2018년 10월 17일 딥러닝 음성합성 multi-speaker-tacotron(tacotron+deepvoice)설치 및 사용법 작성자 : 클루닉스 서진우 (alang@clunix.com) 작성일 : 2018년 3월 1일 음성합성(TTS)을 위한 딥러닝 오픈 모델인 ta
nblog.syszone.co.kr
developers-kr.googleblog.com/2017/12/tacotron-2-generating-human-like-speech-from-text.html
Tacotron 2: 텍스트에서 인간과 유사한 음성 생성하기
<블로그 원문은 이곳 에서 확인하실 수 있으며, 블로그 번역 리뷰는 전태균(Machine Learning GDE) 님이 참여해 주셨습니다.> 게시자: Jonathan Shen, Ruoming Pang(소프트웨어 엔지니어, Google 두뇌...
developers-kr.googleblog.com
keithito/tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial) - keithito/tacotron
github.com
github.com/carpedm20/multi-speaker-tacotron-tensorflow
carpedm20/multi-speaker-tacotron-tensorflow
Multi-speaker Tacotron in TensorFlow. Contribute to carpedm20/multi-speaker-tacotron-tensorflow development by creating an account on GitHub.
github.com
SMARTEAR
Awesome Speech
https://github.com/zzw922cn/awesome-speech-recognition-speech-synthesis-papers
zzw922cn/awesome-speech-recognition-speech-synthesis-papers
Speech synthesis, voice conversion, self-supervised learning, music generation,Automatic Speech Recognition, Speaker Verification, Speech Synthesis, Language Modeling - zzw922cn/awesome-speech-reco...
github.com