Following is the list of accepted ASRU 2017 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at [email protected].
1212 | A CONTEXT-AWARE SPEECH RECOGNITION AND UNDERSTANDING SYSTEM FOR AIR TRAFFIC CONTROL DOMAIN |
1205 | A HIERARCHICAL ATTENTION BASED MODEL FOR OFF-TOPIC SPONTANEOUS SPOKEN RESPONSE DETECTION |
1078 | AALTO SYSTEM FOR THE 2017 ARABIC MULTI-GENRE BROADCAST CHALLENGE |
1058 | ACOUSTIC-TO-WORD MODEL WITHOUT OOV |
1116 | ADVERSARIAL MANIFOLD LEARNING FOR SPEAKER RECOGNITION |
1191 | ADVERSARIAL TRAINING FOR DATA-DRIVEN SPEECH ENHANCEMENT WITHOUT PARALLEL CORPUS |
1079 | AN EMBEDDED SEGMENTAL K-MEANS MODEL FOR UNSUPERVISED SEGMENTATION AND CLUSTERING OF SPEECH |
1301 | AN INVESTIGATION OF MULTI-SPEAKER TRAINING FOR WAVENET VOCODER |
1290 | ATTENTION-BASED WAV2TEXT WITH FEATURE TRANSFER LEARNING |
1143 | AUTOMATIC SPEECH RECOGNITION OF ARABIC MULTI-GENRE BROADCAST MEDIA |
1147 | BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH |
1131 | CHARACTER-BASED UNITS FOR UNLIMITED VOCABULARY CONTINUOUS SPEECH RECOGNITION |
1223 | COMPARISON OF MULTIPLE FEATURES AND MODELING METHODS FOR TEXT-DEPENDENT SPEAKER VERIFICATION |
1242 | COMPOSITE EMBEDDING SYSTEMS FOR ZEROSPEECH2017 TRACK1 |
1118 | COMPUTATIONAL COST REDUCTION OF LONG SHORT-TERM MEMORY BASED ON SIMULTANEOUS COMPRESSION OF INPUT AND HIDDEN STATE |
1065 | CONSISTENT DNN UNCERTAINTY TRAINING AND DECODING FOR ROBUST ASR |
1282 | CRACKING THE COCKTAIL PARTY PROBLEM BY MULTI-BEAM DEEP ATTRACTOR NETWORK |
1125 | CROSS-DOMAIN SPEECH RECOGNITION USING NONPARALLEL CORPORA WITH CYCLE-CONSISTENT ADVERSARIAL NETWORKS |
1244 | DBLSTM BASED MULTILINGUAL ARTICULATORY FEATURE EXTRACTION FOR LANGUAGE DOCUMENTATION |
1285 | DEEP LEARNING METHODS FOR UNSUPERVISED ACOUSTIC MODELING - LEAP SUBMISSION TO ZEROSPEECH CHALLENGE 2017 |
1019 | DEEP QUATERNION NEURAL NETWORKS FOR SPOKEN LANGUAGE UNDERSTANDING |
1096 | DENOTATION EXTRACTION FOR INTERACTIVE LEARNING IN DIALOGUE SYSTEMS |
1228 | DIRECT MODELING OF RAW AUDIO WITH DNNS FOR WAKE WORD DETECTION |
1215 | DYNAMIC TIME-AWARE ATTENTION TO SPEAKER ROLES AND CONTEXTS FOR SPOKEN LANGUAGE UNDERSTANDING |
1271 | EARLY AND LATE INTEGRATION OF AUDIO FEATURES FOR AUTOMATIC VIDEO DESCRIPTION |
1080 | END-TO-END TEXT-INDEPENDENT SPEAKER VERIFICATION WITH FLEXIBILITY IN UTTERANCE DURATION |
1121 | ERROR DETECTION OF GRAPHEME-TO-PHONEME CONVERSION IN TEXT-TO-SPEECH SYNTHESIS USING SPEECH SIGNAL AND LEXICAL CONTEXT |
1154 | EXPLORING ARCHITECTURES, DATA AND UNITS FOR STREAMING END-TO-END SPEECH RECOGNITION WITH RNN-TRANSDUCER |
1312 | EXPLORING ASR-FREE END-TO-END MODELING TO IMPROVE SPOKEN LANGUAGE UNDERSTANDING IN A CLOUD-BASED DIALOG SYSTEM |
1075 | EXPLORING THE USE OF ACOUSTIC EMBEDDINGS IN NEURAL MACHINE TRANSLATION |
1134 | EXTRACTING BOTTLENECK FEATURES AND WORD-LIKE PAIRS FROM UNTRANSCRIBED SPEECH FOR FEATURE REPRESENTATION |
1213 | FEATURE OPTIMIZED DPGMM CLUSTERING FOR UNSUPERVISED SUBWORD MODELING: A CONTRIBUTION TO ZEROSPEECH 2017 |
1055 | FUTURE VECTOR ENHANCED LSTM LANGUAGE MODEL FOR LVCSR |
1051 | FUTURE WORD CONTEXTS IN NEURAL NETWORK LANGUAGE MODELS |
1132 | GATED CONVOLUTIONAL NETWORKS BASED HYBRID ACOUSTIC MODELS FOR LOW RESOURCE SPEECH RECOGNITION |
1033 | GROUND TRUTH ESTIMATION OF SPOKEN ENGLISH FLUENCY SCORE USING DECORRELATION PENALIZED LOW-RANK MATRIX FACTORIZATION |
1072 | GROUNDED LANGUAGE UNDERSTANDING FOR MANIPULATION INSTRUCTIONS USING GAN-BASED CLASSIFICATION |
1142 | HIERARCHICAL RECURRENT NEURAL NETWORK FOR STORY SEGMENTATION USING FUSION OF LEXICAL AND ACOUSTIC FEATURES |
1169 | IMPROVING NATIVE LANGUAGE (L1) IDENTIFATION WITH BETTER VAD AND TDNN TRAINED SEPARATELY ON NATIVE AND NON-NATIVE ENGLISH CORPORA |
1295 | IMPROVING SEPARATION OF OVERLAPPED SPEECH FOR MEETING CONVERSATIONS USING UNCALIBRATED MICROPHONE ARRAY |
1245 | IMPROVING THE EFFICIENCY OF FORWARD-BACKWARD ALGORITHM USING BATCHED COMPUTATION IN TENSORFLOW |
1184 | INCREMENTAL TRAINING AND CONSTRUCTING THE VERY DEEP CONVOLUTIONAL RESIDUAL NETWORK ACOUSTIC MODELS |
1229 | INTEGRATED SPEAKER-ADAPTIVE SPEECH SYNTHESIS |
1241 | INVESTIGATING NATIVE AND NON-NATIVE ENGLISH CLASSIFICATION AND TRANSFER EFFECTS USING LEGENDRE POLYNOMIAL COEFFICIENT CLUSTERING |
1025 | INVESTIGATION OF LATTICE-FREE MAXIMUM MUTUAL INFORMATION-BASED ACOUSTIC MODELS WITH SEQUENCE-LEVEL KULLBACK-LEIBLER DIVERGENCE |
1263 | INVESTIGATION OF TRANSFER LEARNING FOR ASR USING LF-MMI TRAINED NEURAL NETWORKS |
1092 | ITERATIVE POLICY LEARNING IN END-TO-END TRAINABLE TASK-ORIENTED NEURAL DIALOG MODELS |
1097 | JHU KALDI SYSTEM FOR ARABIC MGB-3 ASR CHALLENGE USING DIARIZATION, AUDIO-TRANSCRIPT ALIGNMENT AND TRANSFER LEARNING |
1260 | KEYWORD SPOTTING FOR GOOGLE ASSISTANT USING CONTEXTUAL SPEECH RECOGNITION |
1045 | LANGUAGE DIARIZATION FOR SEMI-SUPERVISED BILINGUAL ACOUSTIC MODEL TRAINING |
1257 | LANGUAGE INDEPENDENT END-TO-END ARCHITECTURE FOR JOINT LANGUAGE IDENTIFICATION AND SPEECH RECOGNITION |
1217 | LANGUAGE MODELING WITH HIGHWAY LSTM |
1308 | LANGUAGE MODELING WITH NEURAL TRANS-DIMENSIONAL RANDOM FIELDS |
1148 | LATTICE RESCORING STRATEGIES FOR LONG SHORT TERM MEMORY LANGUAGE MODELS IN SPEECH RECOGNITION |
1259 | LEARNING MODALITY-INVARIANT REPRESENTATIONS FOR SPEECH AND IMAGES |
1120 | LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION |
1199 | LEVERAGING NATIVE LANGUAGE SPEECH FOR ACCENT IDENTIFICATION USING DEEP SIAMESE NETWORKS |
1020 | LEVERAGING SIDE INFORMATION FOR SPEAKER IDENTIFICATION WITH THE ENRON CONVERSATIONAL TELEPHONE SPEECH COLLECTION |
1293 | LISTENING WHILE SPEAKING: SPEECH CHAIN BY DEEP LEARNING |
1189 | MEETING RECOGNITION WITH ASYNCHRONOUS DISTRIBUTED MICROPHONE ARRAY |
1183 | MGB-3 BUT SYSTEM: LOW-RESOURCE ASR ON EGYPTIAN YOUTUBE DATA |
1010 | MINIMALLY SUPERVISED WRITTEN-TO-SPOKEN TEXT NORMALIZATION |
1304 | MITIGATING THE IMPACT OF SPEECH RECOGNITION ERRORS ON CHATBOT USING SEQUENCE-TO-SEQUENCE MODEL |
1266 | MIT-QCRI ARABIC DIALECT IDENTIFICATION SYSTEM FOR THE 2017 MULTI-GENRE BROADCAST CHALLENGE |
1178 | MODELING CHOICES IN END-TO-END SPEECH RECOGNITION |
1270 | MULTI-LEVEL LANGUAGE MODELING AND DECODING FOR OPEN VOCABULARY END-TO-END SPEECH RECOGNITION |
1133 | MULTILINGUAL BOTTLE-NECK FEATURE LEARNING FROM UNTRANSCRIBED SPEECH |
1042 | MULTI-TASK ENSEMBLES WITH STUDENT-TEACHER TRAINING |
1108 | MULTITASK TRAINING WITH UNLABELED DATA FOR END-TO-END SIGN LANGUAGE FINGERSPELLING RECOGNITION |
1182 | MULTI-VIEW (JOINT) PROBABILITY LINEAR DISCRIMINATION ANALYSIS FOR J-VECTOR BASED TEXT DEPENDENT SPEAKER VERIFICATION |
1117 | NEURAL RELEVANCE-AWARE QUERY MODELING FOR SPOKEN DOCUMENT RETRIEVAL |
1044 | NOISE-ROBUST EXEMPLAR MATCHING FOR RESCORING QUERY-BY-EXAMPLE SEARCH |
1192 | ON LATTICE GENERATION FOR LARGE VOCABULARY SPEECH RECOGNITION |
1175 | ONENET: JOINT DOMAIN, INTENT, SLOT PREDICTION FOR SPOKEN LANGUAGE UNDERSTANDING |
1023 | PERCEPTUAL QUALITY AND MODELING ACCURACY OF EXCITATION PARAMETERS IN DLSTM-BASED SPEECH SYNTHESIS SYSTEMS |
1152 | PERSONALIZED WORD REPRESENTATIONS CARRYING PERSONALIZED SEMANTICS LEARNED FROM SOCIAL NETWORK POSTS |
1015 | REDUCING THE COMPUTATIONAL COMPLEXITY FOR WHOLE WORD MODELS |
1240 | SCALABLE MULTI-DOMAIN DIALOGUE STATE TRACKING |
1047 | SEEING AND HEARING TOO: AUDIO REPRESENTATION FOR VIDEO CAPTIONING |
1027 | SEMI-SUPERVISED TRAINING STRATEGIES FOR DEEP NEURAL NETWORKS |
1151 | SEQUENCE TRAINING OF DNN ACOUSTIC MODELS WITH NATURAL GRADIENT |
1203 | SIMPLIFYING VERY DEEP CONVOLUTIONAL NEURAL NETWORK ARCHITECTURES FOR ROBUST SPEECH RECOGNITION |
1048 | SPARSE REPRESENTATION OF PHONETIC FEATURES FOR VOICE CONVERSION WITH AND WITHOUT PARALLEL DATA |
1174 | SPEAKER-SENSITIVE DUAL MEMORY NETWORKS FOR MULTI-TURN SLOT TAGGING |
1317 | SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3 |
1238 | SPOKEN LANGUAGE BIOMARKERS FOR DETECTING COGNITIVE IMPAIRMENT |
1095 | SPOOFING DETECTION VIA SIMULTANEOUS VERIFICATION OF AUDIO-VISUAL SYNCHRONICITY AND TRANSCRIPTION |
1110 | STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORKS UNDER A MULTI-TASK LEARNING FRAMEWORK |
1251 | STREAMING SMALL-FOOTPRINT KEYWORD SPOTTING USING SEQUENCE-TO-SEQUENCE MODELS |
1196 | SUBBAND WAVENET WITH OVERLAPPED SINGLE-SIDEBAND FILTERBANKS |
1150 | SYLLABLE-BASED ACOUSTIC MODELING WITH CTC-SMBR-LSTM |
1193 | TACKLING UNSEEN ACOUSTIC CONDITIONS IN QUERY-BY-EXAMPLE SEARCH USING TIME AND FREQUENCY CONVOLUTION FOR MULTILINGUAL DEEP BOTTLENECK FEATURES |
1319 | THE BLIZZARD MACHINE LEARNING CHALLENGE 2017 |
1088 | THE CMU ENTRY TO BLIZZARD MACHINE LEARNING CHALLENGE |
1220 | THE IFLYTEK SYSTEM FOR BLIZZARD MACHINE LEARNING CHALLENGE 2017-ES1 |
1216 | THE USTC SYSTEM FOR BLIZZARD MACHINE LEARNING CHALLENGE 2017-ES2 |
1318 | THE ZERO RESOURCE SPEECH CHALLENGE 2017 |
1066 | TOPIC SEGMENTATION IN ASR TRANSCRIPTS USING BIDIRECTIONAL RNNS FOR CHANGE DETECTION |
1105 | TURBO FUSION OF MAGNITUDE AND PHASE INFORMATION FOR DNN-BASED PHONEME RECOGNITION |
1160 | UNSUPERVISED ADAPTATION OF STUDENT DNNS LEARNED FROM TEACHER RNNS FOR IMPROVED ASR PERFORMANCE |
1181 | UNSUPERVISED ADAPTATION WITH DOMAIN SEPARATION NETWORKS FOR ROBUST SPEECH RECOGNITION |
1138 | UNSUPERVISED DOMAIN ADAPTATION FOR ROBUST SPEECH RECOGNITION VIA VARIATIONAL AUTOENCODER-BASED DATA AUGMENTATION |
1284 | UNSUPERVISED HMM POSTERIOGRAMS FOR LANGUAGE INDEPENDENT ACOUSTIC MODELING IN ZERO RESOURCE CONDITIONS |
1250 | UNWRITTEN LANGUAGES DEMAND ATTENTION TOO! WORD DISCOVERY WITH ENCODER-DECODER MODELS |
1164 | UTD-CRSS SUBMISSION FOR MGB-3 ARABIC DIALECT IDENTIFICATION: FRONT-END AND BACK-END ADVANCEMENTS ON BROADCAST SPEECH |
1128 | WERD: USING SOCIAL TEXT SPELLING VARIANTS FOR EVALUATING DIALECTAL SPEECH RECOGNITION |