Available phrasebanks#
Academic Phrasebank#
Elsevier OA CC-BY contains 40k articles from Elsevierβs journals, including from Arts, Business, STEM to Social Sciences[1].
No. |
Phrasebank |
Source |
N of grams |
Lines |
Comments |
---|---|---|---|---|---|
1 |
Book Academic Phrasebank 2014 |
2-5 |
2,190 |
Extract from pdf (Zhihao, 2024) |
|
2 |
Corpus Elsevier OA CC-BY 2020 |
2-6 |
3,792 |
Extract by n-gram (Zhihao 2024) |
|
3 |
πbawe_1000.csv |
4-6 |
1,000 |
Due to inaccessible, only most frequent 1000 list here. (Zhihao, 2024) |
|
4 |
πacademic_word_list |
1 |
570 |
The 570 word for academic English (exclude frequent 2000 words) |
|
5 |
πelsevier_awl |
2,4 |
2-6 |
994 |
The Elsevier phrasebank that contains AWL (Zhihao, 2024) |
6 |
2 |
2-7 |
3,700 |
Environment & Earth Science 3700 collection (Zhihao 2024) |
|
7 |
2 |
2-7 |
3,700 |
Social Science & Psychology 3700 collection (Zhihao 2024) |
|
8 |
πelsevier_MEDI |
2 |
2-7 |
3,700 |
Medicine 3700 collection (Zhihao 2024) |
English Frequent Phrasebank#
No. |
Phrasebank |
Source |
N-gram Length |
Lines |
Comments |
---|---|---|---|---|---|
1 |
Google Books Corpus |
1 |
10,000 |
The 10,000 most common English words from Google Books Corpus |
|
2 |
Internet |
1 |
2,000 |
The 2,000 most common English words |
Note
To use it, you might need to reformat it according to your IDE/Input software.
The common words (e.g., βet al.β in academic publications) are excluded, but not all of them. Some manual modifications are necessary.
When use different criteria of frequency, the phrasebank might vary.
Known Issues#
Phrasebank |
Issues |
---|---|
academic_phrasebank |
Due to the table in the PDF file not being properly handled, many sentences were not extracted correctly. (zhihao) |
elsevier_phrasebank |