universoul circus animals

(as of November 2020). For more information look at the fairseq page.

with Fairseq (Ott et al.,2019), and a script to pre-process the input text.

machine translation, classication, inference and so on) [11, 12, 25, 27, 28, 32, 35, 36,

No bug occurred during the generation.

I used mBART50 finetuned many-to-many model to do the ar_AR-en_XX translation on the IWSLT17 ar-en test set. This is the first volume that brings together research and practice from academic and industry settings and a combination of human and machine translation evaluation. Perform preprocessing (binarization) on our input data: 4. 2 Related Work In order to serve its purpose, our model should be able to process multilingual input sentences, and generate tailored translations for COVID-19-related sentences.

train, valid, test), do not raise error if valid subsets are ignored, dont validate until reaching this many updates, data subset to generate (train, valid, test), id of the shard to generate (id < num_shards), total number of GPUs across all nodes (default: all visible GPUs), total number of processes to fork (default: all visible GPUs), port number (not required if using distributed-init-method), which GPU to use (usually configured automatically), do not spawn multiple processes even if multiple GPUs are visible, Possible choices: c10d, fully_sharded, legacy_ddp, no_c10d, pytorch_ddp, slow_mo, dont shuffle batches between GPUs; this reduces overall randomness and may affect precision but avoids the cost of re-reading the data, disable unused parameter detection (not applicable to ddp-backend=legacy_ddp), [deprecated] this is now defined per Criterion, kill the job if no progress is made in N seconds; set to -1 to disable, Copy non-trainable parameters between GPUs, such as batchnorm population statistics, number of GPUs in each node.

You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. Excuse me, would anyone be kind to tell me which paper the implementation of Multi-Lingual Translation is based on? Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

if > 1, model will generate translations varying by the lengths.

mBART decoder initialization primarily improves mid- and low resource directions which benet from the labeled translation data mBART was trained on. If you worked on any natural language processing (NLP) tasks in the last three years, you have certainly noticed the widespread use of BERT, or similar large pretrained models, as a .

At last, we analyze the results of experiments . ..

Facebook AI is introducing, M2M-100 the first multilingual machine translation (MMT) model that translates between any pair of 100 languages without relying on English data. This paper describes progress towards making a Neural Text-to-Speech (TTS) Frontend that works for many languages and can be easily extended to new languages.

Multilingual Translation. However, research on MT in low-resource languages such as Vietnamese and Korean is still very limited.

maximum iterations for iterative refinement.

Train a new model on one or across multiple GPUs. For my case, I want to translate Thai language to Indonesian language.

Pretraining utilizes large amount of monolingual data to complement the lack of bitext data. Evaluate the perplexity of a trained language model.

Jul 24, 2021 mt-weekly en. Specialized machine learning libraries like: Seq2Seq, Nemo, Fairseq; Many-to-Many multilingual translation; Multi-task / Transfer-learning; Model optimization: pruning, distillation, quantization; Kubernetes and micro services; Working in an agile development environment; About Us: At Global Relay there is no ceiling. Multilingual models enable transfer from related tasks, which is particularly important for low-resource languages; however, parallel data between two otherwise .

TheTransformerarchitecturehasbeendesignedforthebilingual case,wherethetargetlanguageisxed .

Lightweight Adapter Tuning for Multilingual Speech Translation Hang Le1 Juan Pino2 Changhan Wang2 Jiatao Gu2 Didier Schwab1 Laurent Besacier1,3 1Univ. 2 Features Fairseq Models FAIRSEQ provides a collection of MT models (Ng et al.,2019;Lewis et al.,2020) and LMs (Liu et al.,2019;Conneau et al.,2020) that demonstrate state-of-the-art . path to run plasma_store, defaults to /tmp/plasma. log progress every N batches (when progress bar is disabled), Possible choices: json, none, simple, tqdm, use a memory-efficient version of BF16 training; implies bf16, use a memory-efficient version of FP16 training; implies fp16, pct of updates that can overflow before decreasing the loss scale. If youre a developer or data scientist new to NLP and deep learning, this practical guide shows you how to apply these methods using PyTorch, a Python-based deep learning library. This book constitutes the refereed proceedings of the 15th China Conference on Machine Translation, CCMT 2019, held in Nanchang, China, in September 2019. Valid options can be found in fairseq.data.utils.post_process.

We provide examples of using the model for both multilingual translation and paraphrase generation. D-2 -2.5059561729431152 Berjalan jarak jauh? And even if we install the CPU-only version of PyTorch, the FairSeq will still try to use GPU (by calling torch.cuda.XXX directly and causing exceptions to be raised). This book covers the state-of-the-art approaches for the most popular SLU tasks with chapters written by well-known researchers in the respective fields.

The sum(args.pipeline_decoder_balance) should equal the total number of decoder layers in the model, a list of device indices indicating which device to place each of the N_K partitions. We introduce CoVoST, a multilingual speech-to-text translation . In this work, we aim to build a single multilingual trans-lation system with a hypothesis that a uni-versal cross-language representation leads to better multilingual translation performance. Full list of languages can be found at the bottom.

Fairseq will automatically pick them up and it should improve speed by a decent amount.

(The new way of calling this function is to pass parameter like torch.device(cuda:0), which can support CPU target by using torch.deviced(cpu) but setting [cpu, cpu] in m2m_100 input parameter breaks many part of FairSeq which requires the parameter being number.

automatic speech recognition, understanding, and related fields of research We need to supply the which GPUs we want to send the part of model to run using parameters like: Here, it is necessary to set the number of devices and balances correctly (I believe it has to be the same setting as how the model was trained) otherwise the pretrained weight cannot be loaded (because the tensor name has the device and balance number in it).

Inclusion Of About 160 Short-Answer Questions And Over 400 Objective Questions In The Question Bank Makes The Book Useful For Engineering Students As Well As For Those Preparing For Gate, Upsc And Other Qualifying Examinations.In Addition perform unknown replacement (optionally with alignment dictionary), if set, only retain dropout for the specified modules; if not set, then dropout will be retained for all modules.

It is a long road ahead for this vision to []

if set, dont use seed for initializing random generators.

Convolutional Neural Networks (CNN)

Machine Translation - Resources Software. The length of this list should equal the length of the pipeline-encoder-balance argument, partition the pipeline parallel decoder into N_K pieces, where each piece contains N_i layers.

FAIRSEQ: A Fast, Extensible Toolkit for Sequence Modeling Myle Ott 4Sergey Edunov Alexei Baevski Angela Fan Sam Gross4 Nathan Ng4 David Grangier5y Michael Auli4 4Facebook AI Research 5Google Brain Abstract FAIRSEQ is an open-source sequence model-ing toolkit that allows researchers and devel-opers to train custom models for translation,

Perform translation.

Fairseq resolve targetted GPU by calling torch.device() using these numbers as parameters, which is a legacy way of calling the function and not supporting the CPU target at all.

Our contributions consist in the release of: An MNMT model, and benchmark results on standard test sets;

Fairseq provides several command-line tools for training and evaluating models: fairseq-preprocess: Data pre-processing: build vocabularies and binarize training data; fairseq-train: Train a new model on one or multiple GPUs; fairseq-generate: Translate pre-processed data with a trained model; fairseq-interactive: Translate raw text with a trained model

As far as NMT models are con-cerned, both multilingual and domain-specic sen- Note that we use slightly different preprocessing here than for the IWSLT'14 En-De data above. If you installed all libraries inside venv environment, you may need to search for the file inside, /lib/python3.8/site-packages/fairscale, In my case, I installed fairscale globally. However, much of this work is English-Centric by training only on data which was translated from or to English. I am a Research Scientist Director at Facebook AI Research in Menlo Park where I work on speech processing and NLP which resulted in projects such as wav2vec, the fairseq toolkit, the first modern convolutional seq2seq models outperforming RNNs, as well as top ranked submissions at the WMT news translation task in 2018 and 2019. In their . Using novel mining strategies to create translation data, we built the first truly "many-to-many" data set with 7.5 billion sentences for 100 languages. Contribute to aa-Aliane/Natural-langage-processing development by creating an account on GitHub.

Convolutional Neural Networks (CNN)

Prepare input data. When translating, say, Chinese to French, previous best multilingual models train on Chinese to English and English to French, because English training data is the most widely available. Scalable Multilingual Frontend for TTS. I created and run a script to do such task like this. We provide reference implementations of various sequence modeling papers: List of implemented papers. In this example we'll train a multilingual {de,fr}-en translation model using the IWSLT'17 datasets.

The result is a big leap in all translation benchmarking metrics (ex. 1.

H-3 -1.5292832851409912 Saya ingin lebih banyak karyawan. SWAP Memory: At least 128GB, see https://bogdancornianu.com/change-swap-size-in-ubuntu/ for setting up SWAP memory. Train a new model on one or across multiple GPUs.

multilingual translation models and unlabeled data pretraining.

Fairseq: Fairseq is Facebook's sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. The data will be available in examples/data. Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. formiel/fairseq official. That means the model was divided into multiple parts that can be run in parallel across many GPUs.

post-process text by removing BPE, letter segmentation, etc.

The motivation of this blog is to share our research findings in a more understandable format than standard academic papers. This book provides an overview of various techniques for the alignment of bitexts.

Machine Translation Weekly 86: The Wisdom of the WMT Crowd.

class PipelineParallelTransformerModel(BaseFairseqModel): def __init__(self, encoder, decoder, balance, devices, chunks, checkpoint): devices = range(torch.cuda.device_count()), devices = [torch.device(d) for d in devices], devices = cast(List[torch.device], devices), wget https://dl.fbaipublicfiles.com/m2m_100/spm.128k.model, wget https://dl.fbaipublicfiles.com/m2m_100/data_dict.128k.txt, --srcdict data_dict.128k.txt --tgtdict data_dict.128k.txt, wget https://dl.fbaipublicfiles.com/m2m_100/model_dict.128k.txt, wget https://dl.fbaipublicfiles.com/m2m_100/language_pairs.txt, --decoder-langtok --encoder-langtok src \, --distributed-world-size 1 --distributed-no-spawn \, --model-overrides '{"ddp_backend": "c10d", "pipeline_balance": "1, 15, 13, 11, 11, 1" , "pipeline_devices": "0, 1, 0, 2, 3, 0" }' \, --pipeline-decoder-balance '[3,11,11,1]' \, --pipeline-decoder-devices '[0,2,3,0]' > gen_out.

Found inside Page 524Note that our experiments are similar to multilingual translation [4], but in our case we do not add a language tag to each sentence to help the model We follow the setting transformer iwslt in fairseq [10] to train the model.

As of November 2020, FairSeq m2m_100 is considered to be one of the most advance machine translation model. It uses a transformer-base model to do direct translation between any pair of supported 100 languages, without routing through intermediate language (English) as in the majority of machine translation models. fairseq is a Pytorch-based framework for sequence modeling, such as machine translation or text generation.

Convolution

Multilingual contextual models, many of which are available through HuggingFace transformers. It is a task with a history that dates back to a demo given in 1983.

Convolutional Neural Networks (CNN)

We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in .

Write checkpoints asynchronously in a separate thread. Speech-to-text translation is the task of translating a speech g iven in a source language into text written in a different, target language.

latency_augmented_label_smoothed_cross_entropy, label_smoothed_cross_entropy_with_alignment, Tutorial: Classifying Names with a Character-Level RNN, path to save logs for tensorboard, should match logdir of running tensorboard (default: no tensorboard logging), Weights and Biases project name to use for logging, number of updates before increasing loss scale, number of updates before increasing AMP loss scale, path to a python module containing custom extensions (tasks and/or architectures), Possible choices: byte_bpe, bytes, characters, fastbpe, gpt2, bert, hf_byte_bpe, sentencepiece, subword_nmt, Possible choices: adadelta, adafactor, adagrad, adam, adamax, composite, cpu_adam, lamb, nag, sgd, train file prefix (also used to build dictionaries), comma separated, valid file prefixes (words missing from train set are replaced with ), comma separated, test file prefixes (words missing from train set are replaced with ), maximum number of tokens in a validation batch (defaults to max-tokens), batch size of the validation batch (defaults to batch-size), SlowMo momentum term; by default use 0.0 for 16 GPUs, 0.2 for 32 GPUs; 0.5 for 64 GPUs, 0.6 for > 64 GPUs, partition the model into N_K pieces, where each piece contains N_i layers.

Multilingual Denoising Pre-training for Neural Machine Translation.

Model Description. P-2 -6.9086 -5.2208 -0.9001 -3.2984 -0.3153 -0.7454 -0.1531, H-1 -2.0654807090759277 Pekerja Layanan Kotor, D-1 -2.0654807090759277 Pekerja Layanan Kotor, P-1 -6.4449 -4.4084 -0.6868 -0.2564 -2.1839 -0.1741 -3.2183 -0.6242 -0.5922. This guide describes the steps for running Facebook FairSeq m2m_100 multilingual translation model in CPU-only environment. As far as NMT models are con-cerned, both multilingual and domain-specic sen-

; Evaluation We'll be using the sentence-piece BLEU (spBLEU) variant for evaluation. Tutorial: Classifying Names with a Character-Level RNN, number of updates before increasing loss scale, path to a python module containing custom extensions (tasks and/or architectures), maximum number of sentences in a validation batch (defaults to max-sentences), path(s) to model file(s), colon separated, remove BPE tokens before scoring (can be set to sentencepiece), perform unknown replacement (optionally with alignment dictionary). introduce our multilingual translation model.

a dictionary used to override model args at generation that were used during model training, generate sequences of maximum length ax + b, where x is the source length, generations should match the source length, length penalty: <1.0 favors shorter, >1.0 favors longer sentences, unknown word penalty: <0 produces more unks, >0 produces fewer, initialize generation by target prefix of given length, ngram blocking such that this size ngram cannot be repeated in the generation, sample hypotheses instead of using beam search, sample from top K likely next words instead of all words, sample from the smallest set whose cumulative probability mass exceeds p for next words, strength of diversity penalty for Diverse Beam Search, strength of diversity penalty for Diverse Siblings Search, if set, uses attention feedback to compute and print alignment to source tokens (valid options are: hard, soft, otherwise treated as hard alignment).

This volume offers an overview of current efforts to deal with dataset and covariate shift. Translation / Warren Weaver / - Mechanical translation / A.D. Booth / - The mechanical determination of meaning / Erwin Reifler / - Stochastic methods of mechanical translation / Gilbert W. King / - A framework for syntactic translation /

We extend the exploration of different back-translation methods from bilingual translation to multilingual translation .

Multilingual translation models allow languages to transfer the learning from each other.

2 - Mark the official implementation from paper authors . train, valid, test), comma separated list of data subsets to use for validation (e.g.

0 share.

D-3 -1.5292832851409912 Saya ingin lebih banyak karyawan.

Found inside Page 46Therefore, we need to vectorize multilingual, here, two kinds of vectorization representation strategies are proposed: of 6 7 8 https://pytorch.org/. https://github.com/pytorch/fairseq. https://github.com/facebookresearch/fastText.

Autoencoder for converting an RBG Image to a GRAY scale Image. Our single multilingual model performs as well as traditional bilingual models and achieved a 10 BLEU point improvement over English-centric multilingual models.

Found inside Page 488Fair-seq, wav2vec 2.0 pytorch example (2021). https://github.com/pytorch/fairseq/tree/ master/examples/wav2vec 4. Silero vad: pre-trained In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187197.

Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)

mBART is trained by applying the BART (Lewis et al., 2019) to large-scale monolingual corpora across many languages.The input texts are noised by masking phrases and permuting sentences, and a single Transformer (Vaswani et al., 2017) model is learned to recover the texts.

Translation. how many subprocesses to use for data loading, ignore too long or too short lines in valid and test set, batch size will be a multiplier of this value, maximum sequence length in batch will be a multiplier of this value, data subset to use for training (e.g. So my fairscale is at, .local/lib/python3.8/site-packages/fairscale, File: .local/lib/python3.8/site-packages/fairscale/nn/pipe/pipe.py, Line: 509 Function: __init__ of class: Pipe, * In the following steps, I will demonstrate the command to do translation from Thai to Indonesian language. This book constitutes the refereed proceedings of the 16th China Conference on Machine Translation, CCMT 2020, held in Hohhot, China*, in October 2020.

Command-line Tools.

special decoding format for advanced decoding. Found inside Page 259Kim, S., Toutanova, K., Yu, H.: Multilingual named entity recognition using parallel data and metadata from Wikipedia. A.T., Chng, E.S., Li, H.: Named-entity tagging and domain adaptation for better customized translation (2018) 20. This book is the first volume that focuses on the specific challenges of machine translation with Arabic either as source or target language. Back translation, also referred to as reverse translation, is the process of re-translating content from the target language back to its source language in literal terms.

Put the pretrained weight file inside fairseq directory. Another problem is that the number of devices is the ID of the GPU that we want to send the weight to. In recent years, a state-of-the-art context-based embedding model introduced by Google, bidirectional encoder representations for transformers (BERT), has begun to appear in .

Dependency-based methods for syntactic parsing have become increasingly popular in natural language processing in recent years. This book gives a thorough introduction to the methods that are most widely used today. Intimately familiar with the Haida language, John Enrico bases this comprehensive description of the syntax of two Haida dialects on his twenty-five years of fieldwork in the Haida community and on the materials collected by the

He talks about the latest advances in MT, the newest open challenges for the field, and promising directions on the path toward universal translation. to fairseq Users. How does the computer learn to understand what it sees? Deep Learning for Vision Systems answers that by applying deep learning to computer vision. Using only high school algebra, this book illuminates the concepts behind visual intuition.

Fairseq provides several command-line tools for training and evaluating models: Data pre-processing: build vocabularies and binarize training data.

I will put all my input text in Thai language inside.

40 views. This is prohibitively impossible for many PoC / develop environments for the majority of developer communities.

Possible choices: adaptive_loss, composite_loss, cross_entropy, ctc, hubert, label_smoothed_cross_entropy, latency_augmented_label_smoothed_cross_entropy, label_smoothed_cross_entropy_with_alignment, legacy_masked_lm_loss, masked_lm, model, nat_loss, sentence_prediction, sentence_ranking, wav2vec, vocab_parallel_cross_entropy, Possible choices: cosine, fixed, inverse_sqrt, manual, pass_through, polynomial_decay, reduce_lr_on_plateau, tri_stage, triangular, Possible choices: sacrebleu, bleu, chrf, wer, Possible choices: language_modeling, speech_to_text, translation, simul_speech_to_text, simul_text_to_text, hubert_pretraining, multilingual_translation, sentence_ranking, cross_lingual_lm, audio_pretraining, audio_finetuning, sentence_prediction, translation_lev, masked_lm, denoising, multilingual_denoising, semisupervised_translation, translation_multi_simple_epoch, translation_from_pretrained_bart, multilingual_masked_lm, translation_from_pretrained_xlm, online_backtranslation, legacy_masked_lm, dummy_lm, dummy_masked_lm, dummy_mt, Possible choices: raw, lazy, cached, mmap, fasta, map words appearing less than threshold times to unknown, if true, only builds a dictionary and then exits. The First English Translation of Hayao Miyazaki's Favorite Childhood Book - The New York Times .

Fairseq(-py) is a sequence modeling toolkit that allows researchers anddevelopers to train custom models for translation, summarization, languagemodeling and other text generation tasks.We provide reference implementations of various sequence modeling papers: .

Proceedings of the 5th Conference on Machine Translation (WMT) , pages 134 138 Online, November 19 20, 2020. c 2020 Association for Computational Linguistics 134 The TALP-UPC System Description for WMT20 News Translation Task: Multilingual Adaptation for Low Resource MT Carlos Escolano, Marta R. Costa-jussa, Jos` e A. R. Fonollosa,

This model uses a shared BPE vocabulary of 16k learned jointly across all languages.

Note that we cannot fp16 here as the CPU only does not support half precision floating point like GPUs. Q: Your team has just pioneered the first-ever multilingual model to win the prestigious WMT competition, a competition you helped create in the early days of MT, around 15 years ago. if set, run exact the maximum number of iterations without early stop.

Fairseq S2T: Fast speech-to-text modeling with fairseq Jan 2020

Transformer (NMT) Author: Facebook AI (fairseq Team) Transformer models for English-French and English-German translation. You can download it from https://github.com/pytorch/fairseq/tree/master/examples/m2m_100. Major Breakthroughs in.

if set, outputs words and their predicted log probabilities to standard output, if set, outputs word statistics such as word count, average probability, etc, ensures that every evaluated token has access to a context of at least this size, if possible, if BxT is more than this, will batch the softmax over vocab to this amount of tokens in order to fit into GPU memory. a multilingual neural machine translation model (MNMT) that can be used to translate biomedical text.

The recently proposed BERT (Devlin et al., 2019) has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc.

Feel free to still share your training log so that I can help to double check. He talks about the latest advances in MT, the newest open challenges for the field, and promising directions on the path toward universal translation. ** Depending on your library installation. However, in that particular instance, the example for translation does not start from scratch, and I wanted to check what multilingual translation could do here, as I'm using English, Dutch & French on translate.google.com (For food sometimes french is much better than english for me). Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. Grenoble Alpes, CNRS, LIG 2Facebook AI 3Naver Labs Europe fhang.le, didier.schwab, laurent.besacierg@univ-grenoble-alpes.fr

This is the first comprehensive book to cover all aspects of word sense disambiguation. It covers major algorithms, techniques, performance measures, results, philosophical issues and applications. This book provides a survey of the research on commonsense knowledge. Organized into 10 chapters, this book begins with an overview of the basic ideas on artificial intelligence commonsense reasoning. machine translation (WMT-21).

. Copyright Facebook AI Research (FAIR) File: fairseq/model_parallel/models/pipeline_parallel_transformer/model.py, Line: 44 Function: __init__ of class: PipelineParallelTransformerModel.

I had a fix 1566cfb that fixed a bug for one-to-many multilingual translation. Paths outside /tmp tend to fail. While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language . The model is both multi-domain and multi-lingual, covering translation from French, German, Spanish, Italian and Korean to English.

Scaling Model Capacity: To boost the capacity of multilingual model designs, the model size was raised from 15 billion to 52 billion parameters. read this many sentences into a buffer before processing them. A detailed comparison of FAIRSEQ S2T with its counterparts can be found in Table1.

The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation. The 1990s saw a paradigm change in the use of corpus-driven methods in NLP.

This book provides the readers with retrospective and prospective views with detailed explanations of component technologies, speech recognition, language translation and speech synthesis.

This volume is based on contributions from the First International Conference on Recent Advances in Natural Language Processing (RANLP'95) held in Tzigov Chark, Bulgaria, 14-16 September 1995.

The length of this list should equal the length of the pipeline-decoder-balance argument, finetune from a pretrained model; note that meters and lr scheduler will be reset, path(s) to model file(s), colon separated.

Facebook does offer an open-source library, Fairseq, and pre-trained models.

Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020) I did some research and debugged the source code of FairSeq and m2m_100 and successfully was able to modify the source code / setting to run the m2m_100 model on a CPU only machine. Multilingual Pretraining Methods With large amounts of unlabeled data, various self-supervised pretraining approaches have been proposed to initialize models or parts of the models for downstream tasks (e.g.

Neural Machine Translation (I) by Marta R. Costa-juss, Gerard I. Gllego, Javier Ferrando & Carlos Escolano.

Polyisoprene Condoms Brands, Wolters Kluwer Phone Number, Urgent Team Nashville, Chiefs Practice Squad, Lovecraft City Name Generator, Python Dictionary If Key Exists Add Value, Sears Craftsman Wrench Set, Boston Development Projects, 3 Letter Words From Relay, Masters In Process Engineering In Canada,

universoul circus animals

universoul circus animalsuniversity of toronto chemical engineering admission requirements

universoul circus animals