В этой статье я покажу, как я создавал свой датасет для классификации текста, который был мне нужен для того, чтобы отдавать команды для моего ассистента умного дома. Работаю я с библиотекой трансформер от Hugging Face.
Определение задачи для создания датасета
В данном случае я хочу обучить своего ассистента, чтобы я говорил ему к примеру “Включи музыку”, а он мне в ответ возвращал понимание, о чем я его попросил. То есть я ему – “включи музыку”, а он мне – “music_on”. Далее еще несколько примеров:
- Включи свет – light_on
- Выключи свет – light_off
- Подруби музло – music_on
- Вырубай музыку, отвлекает – music_off
Как создать датасет для умного дома на Python?
Этот блок статьи будет в двух частях. Одна будет про то, как именно создать такой датасет с нуля и как руками добавить в него несколько итемов.
Как создать датасет вручную локально?
Сначала мы создаем пустые тренировочный и тестовые датасеты:
from datasets import DatasetDict, Dataset
train_dataset = Dataset.from_dict({
"text": [],
"label": []
})
test_dataset = Dataset.from_dict({
"text": [],
"label": []
})
Затем реализуем бесконечный цикл по добавлению новых данных в датасет, этот код спрашивает хотите ли вы добавить новый элемент. Если да, то нужно вбить текст вашей возможной просьбы и ее класс. Класс нужно указывать в виде int числа. Например текст выключить музыку будет иметь лейбл 0, а текст включить музыку будет иметь лейбл 1. А текст включи свет, будет с лейблом 3 и так далее:
isDone = ""
while True:
isDone = input("Добавить новый элемент? Д/Н: ")
if isDone == "Д":
train_dataset = train_dataset.add_item({"text": input("Введите трейн текст: "), "label": int(input("Введите трейн лейбл число: "))})
test_dataset = test_dataset.add_item({"text": input("Введите тест текст: "), "label": int(input("Введите трейн лейбл число: "))})
else:
break
Далее сохраняем добавляем оба датасета в один родительский датасет:
dataset_dict = DatasetDict({"train": train_dataset, "test": test_dataset})
print("СОХРАНЯЕМ", dataset_dict)
dataset_dict.save_to_disk("my_dataset")
В итоге получаем такой результат:
from datasets import DatasetDict, Dataset
train_dataset = Dataset.from_dict({
"text": [],
"label": []
})
test_dataset = Dataset.from_dict({
"text": [],
"label": []
})
isDone = ""
while True:
isDone = input("Добавить новый элемент? Д/Н: ")
if isDone == "Д":
train_dataset = train_dataset.add_item({"text": input("Введите трейн текст: "), "label": int(input("Введите трейн лейбл число: "))})
test_dataset = test_dataset.add_item({"text": input("Введите тест текст: "), "label": int(input("Введите трейн лейбл число: "))})
else:
break
dataset_dict = DatasetDict({"train": train_dataset, "test": test_dataset})
print("Сохранение датасета...", dataset_dict)
dataset_dict.save_to_disk("my_dataset_v1")
Как добавить данные вручную в локальный датасет?
Тут почти такая же история, но с незначительными изменениями. Сначала мы подгружаем локально наш датасет, затем проводим тот же бесконечный цикл и после чего сохраняем датасет, но уже как новый датасет, с новым названием и новой версией:
from datasets import DatasetDict
datasetname = "my_dataset_v1"
new_dataset_dict = DatasetDict()
new_dataset_dict = new_dataset_dict.load_from_disk(datasetname)
print("Подгружен датасет в таком виде:", new_dataset_dict)
train_dataset = new_dataset_dict["train"]
test_dataset = new_dataset_dict["test"]
isDone = ""
while True:
isDone = input("Добавить новый элемент? Д/Н: ")
if isDone == "Д":
train_dataset = train_dataset.add_item({"text": input("Введите трейн текст: "), "label": int(input("Введите трейн лейбл число: "))})
test_dataset = test_dataset.add_item({"text": input("Введите тест текст: "), "label": int(input("Введите трейн лейбл число: "))})
else:
break
new_dataset_dict = DatasetDict({"train": train_dataset, "test": test_dataset})
idx = datasetname.rfind("_v")
if idx != -1:
version = int(datasetname[idx+2:]) + 1
new_datasetname = datasetname[:idx] + f"_v{version}"
else:
new_datasetname = datasetname + "_v1"
new_dataset_dict.save_to_disk(new_datasetname)
print("Сохранено в таком виде:", new_dataset_dict)
Как создать локальный датасет из csv файла?
Самый полезный на практике способ, когда вы уже сформировали обучающие данные в csv файл и вам нужно создать из них датасет для любой из моделей библиотеки Transformers. В данном случае у меня в двух разных файлах были обучающие данные с просьбами включить музыку и выключить музыку. Вот собственно код:
import csv
from datasets import DatasetDict, Dataset
train_dataset = Dataset.from_dict({
"text": [],
"label": []
})
test_dataset = Dataset.from_dict({
"text": [],
"label": []
})
with open('scripts/mydatasets/music_on.csv', 'r', encoding="utf-8") as file:
reader = csv.reader(file)
next(reader) # пропускаем заголовок
num = 0
for row in reader:
num = num + 1
if num <= 30:
train_dataset = train_dataset.add_item({"text": row[0], "label": int(row[1])})
else:
test_dataset = test_dataset.add_item({"text": row[0], "label": int(row[1])})
with open('scripts/mydatasets/music_off.csv', 'r', encoding="utf-8") as file:
reader = csv.reader(file)
next(reader) # пропускаем заголовок
num = 0
for row in reader:
num = num + 1
if num <= 25:
train_dataset = train_dataset.add_item({"text": row[0], "label": int(row[1])})
else:
test_dataset = test_dataset.add_item({"text": row[0], "label": int(row[1])})
dataset_dict = DatasetDict({"train": train_dataset, "test": test_dataset})
print(train_dataset["text"], train_dataset["label"])
print("СОХРАНЯЕМ", dataset_dict)
dataset_dict.save_to_disk("my_dataset")
Далее можно приступать к обучению модели классификации текста с использованием нашего датасета. В этой статье продолжение.
В моем паблике ВК можно наблюдать за тем, что я делаю прямо сейчас, а можем и вместе что-нибудь замутить – присоединяйтесь.
Genuinely no matter if someone doesn’t know after that its up to other people that they will assist, so here it takes place.
I every time spent my half an hour to read this web site’s posts every day
along with a cup of coffee.
Hello to every single one, it’s in fact a good for me to go
to see this site, it consists of important Information.
Heya i am for the primary time here. I found this board and
I in finding It truly helpful & it helped me out much.
I am hoping to provide one thing back and help others like you helped me.
my web-site :: tonic greens amazon
This design is wicked! You certainly know how to keep a reader amused.
Between your wit and your videos, I was almost moved to start my own blog (well,
almost…HaHa!) Excellent job. I really enjoyed
what you had to say, and more than that, how you presented it.
Too cool!
my blog post: hemp smart reviews
Awesome! Its really remarkable piece of writing, I have got much clear
idea on the topic of from this piece of writing.
Feel free to visit my web page :: renew weight loss
Why users still use to read news papers when in this technological world all is presented on web?
When some one searches for his essential thing, so he/she desires to be available
that in detail, therefore that thing is maintained over here.
Great goods from you, man. I have understand your stuff previous to and you are just extremely
magnificent. I actually like what you’ve acquired here, really like what you are
saying and the way in which you say it. You make it entertaining and you still care for to keep it wise.
I can’t wait to read much more from you. This is really a terrific web site.
Excellent way of describing, and good piece of writing to take information about my presentation topic, which i am going to present in institution of
higher education.
I’ve learn a few good stuff here. Definitely worth bookmarking for revisiting.
I wonder how so much effort you set to make such
a excellent informative web site.
I nonetheless desire many elements of CoffeeScript when it comes
to style and clarity by a long stretch (long-time period Haskell fan).
Coffees from Latin America are proper in the middle by way of acidity
and body with fruit, nut, vanilla or earthy flavors. Decide to take satisfaction in yourself and your nutrition, and take pleasure in consuming right.
High in calories but essential for a balanced consuming sample, whole fats ought to supply 20 to
35 p.c of calories, with a lot of the fat consumed coming from oils.
The USDA Dietary Guidelines recommend that you limit your intake of saturated
fat, in nonlean meat, full-fat dairy merchandise, and tropical oils equivalent to palm kernel and coconut oil, to lower than ten p.c
of your total calorie intake. At the same time, the Dietary Guidelines caution consumers to restrict stable fats,
akin to those present in meat, complete-fats
dairy products, and processed foods. Keeping canines healthy and trim works the identical means as it does with
individuals: They should solely eat sufficient food to
keep up the appropriate body weight and get common exercise.
Eating the right amount of carbohydrate will assist
you eliminate saved fats, and you may really feel higher whereas doing so.
Just earlier than it touches the floor, elevate your left leg
and decrease your right leg.
You actually make it seem so easy with your presentation but I find this topic to be really something that I think I would never understand.
It seems too complicated and very broad for me. I’m looking forward for your next post,
I’ll try to get the hang of it!
My web page tonic greens
Ahaa, its fastidious dialogue about this paragraph here at this blog,
I have read all that, so at this time me also commenting at this place.
My blog :: the genius wave reviews
I’m not sure why but this site is loading incredibly slow
for me. Is anyone else having this issue or is it a issue on my end?
I’ll check back later on and see if the problem still exists.
Have you ever considered about adding a little bit more than just your articles?
I mean, what you say is valuable and everything. Nevertheless think about if you added some great pictures or videos to give your posts more, “pop”!
Your content is excellent but with images and clips, this
blog could definitely be one of the very best in its field.
Terrific blog!
Also visit my webpage – boostaro
Someone essentially help to make severely articles I’d state.
That is the very first time I frequented your web page and up
to now? I surprised with the analysis you made to make this actual put up amazing.
Fantastic task!
What’s up, everything is going well here and ofcourse every one
is sharing information, that’s really good, keep up writing.
Also visit my homepage: lottery defeater software reviews
Hi there would you mind letting me know which hosting company
you’re working with? I’ve loaded your blog in 3 completely different web browsers and I must say this blog loads a lot faster then most.
Can you recommend a good hosting provider at a reasonable price?
Thank you, I appreciate it!
Also visit my web-site; lottery defeated reviews
Unquestionably consider that which you said. Your favourite reason appeared to
be at the web the simplest thing to keep in mind of.
I say to you, I definitely get annoyed while other people think about issues that they just don’t recognize about.
You controlled to hit the nail upon the top and defined out the entire thing
without having side effect , other folks could take a signal.
Will probably be again to get more. Thanks
You’re so interesting! I don’t suppose I’ve read through something like
that before. So great to discover someone with some unique thoughts on this topic.
Seriously.. many thanks for starting this up. This site is something that is needed on the web, someone with some originality!
my webpage :: prostadine alternative
Hello mates, its enormous paragraph regarding educationand fully defined,
keep it up all the time.
Review my webpage; nerve fresh reviews and complaints
I am actually thankful to the owner of this web page who has shared this wonderful article at here.
Feel free to visit my webpage; gluco freedom reviews
What’s up colleagues, fastidious paragraph and fastidious urging commented
here, I am genuinely enjoying by these.
Here is my web blog: lottery defeated software reviews
Hello just wanted to give you a quick heads up and let you know a few of the pictures aren’t loading
correctly. I’m not sure why but I think its a linking issue.
I’ve tried it in two different web browsers and both show the same outcome.
What’s up i am kavin, its my first time to commenting
anywhere, when i read this piece of writing i thought i could also make comment due to this sensible
piece of writing.
In fact no matter if someone doesn’t know then its up to
other viewers that they will help, so here it takes place.
Also visit my blog post; lottery defeater software
My brother suggested I might like this website. He was entirely
right. This post actually made my day. You cann’t imagine simply how
much time I had spent for this information! Thanks!
Wow, awesome blog layout! How long have you been blogging
for? you made blogging look easy. The overall look of your website is wonderful, let alone the content!
obviously like your web-site however you have to check
the spelling on several of your posts. Several of them are rife with spelling issues and I find it very bothersome to inform the
truth however I will certainly come back again.
Greetings, I believe your web site might be having browser compatibility problems.
Whenever I take a look at your web site in Safari, it looks
fine however, if opening in Internet Explorer, it has
some overlapping issues. I simply wanted to provide you with a
quick heads up! Apart from that, wonderful website!
My brother suggested I might like this web site.
He was entirely right. This post truly made my day.
You can not imagine simply how so much time I had spent for
this information! Thanks!
This paragraph gives clear idea designed for the new visitors of blogging, that truly how to do running
a blog.
Thanks very nice blog!
I was curious if you ever thought of changing the structure
of your website? Its very well written; I love what youve
got to say. But maybe you could a little more
in the way of content so people could connect with it better.
Youve got an awful lot of text for only having 1 or 2
images. Maybe you could space it out better?
Feel free to visit my page; does provadent have side effects
fantastic issues altogether, you simply gained a new reader.
What could you recommend about your submit that you made some days ago?
Any positive?
This is really interesting, You are a very skilled blogger.
I’ve joined your rss feed and look forward to seeking more of your
great post. Also, I’ve shared your website in my social networks!
These are actually impressive ideas in on the topic of blogging.
You have touched some pleasant points here. Any way keep up wrinting.
Can I just say what a comfort to uncover somebody who truly understands what they are discussing online.
You certainly understand how to bring a problem to light and
make it important. More people need to look at this and
understand this side of your story. It’s surprising you aren’t more popular because
you certainly possess the gift.
Why people still use to read news papers when in this technological world all is presented on web?
I am extremely inspired together with your writing talents as smartly as with the structure in your blog.
Is that this a paid topic or did you customize it yourself?
Anyway stay up the nice high quality writing, it’s uncommon to
see a great weblog like this one nowadays..
It’s amazing designed for me to have a website, which is helpful in favor of my experience.
thanks admin
Hey would you mind stating which blog platform you’re working with?
I’m looking to start my own blog in the near future but I’m having a difficult time selecting between BlogEngine/Wordpress/B2evolution and Drupal.
The reason I ask is because your layout seems different then most blogs and I’m
looking for something unique. P.S
Apologies for being off-topic but I had to ask!
You can certainly see your enthusiasm in the article you write.
The arena hopes for more passionate writers such as you who are not afraid to mention how they believe.
Always follow your heart.
I visited many sites except the audio feature for audio songs present at this website is actually superb.
Wow that was unusual. I just wrote an really long comment but after
I clicked submit my comment didn’t show up. Grrrr…
well I’m not writing all that over again. Anyways, just wanted to say wonderful
blog!
Hey, I think your website might be having browser compatibility issues.
When I look at your website in Safari, it looks fine but when opening
in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads
up! Other then that, very good blog!
Hello! I know this is kinda off topic however I’d
figured I’d ask. Would you be interested in trading links
or maybe guest writing a blog article or vice-versa? My blog
goes over a lot of the same topics as yours and I feel we could greatly benefit from each other.
If you might be interested feel free to shoot me an email.
I look forward to hearing from you! Superb blog by the way!
Hi, after reading this awesome piece of writing i am as well cheerful to share
my knowledge here with colleagues.
I know this if off topic but I’m looking into starting my own weblog and was curious what all
is needed to get setup? I’m assuming having a blog like yours would cost a pretty penny?
I’m not very internet savvy so I’m not 100% sure. Any recommendations or advice would be greatly appreciated.
Appreciate it
Greetings from Los angeles! I’m bored to death at work so I decided to check out your blog on my
iphone during lunch break. I enjoy the info you present here
and can’t wait to take a look when I get home. I’m shocked at
how quick your blog loaded on my phone .. I’m not even using WIFI, just 3G ..
Anyhow, good blog!
Heya i’m for the first time here. I came across this board and I find It truly useful & it helped me out much.
I hope to give something back and help others like you aided me.
It’s very simple to find out any topic on net as compared
to books, as I found this post at this site.
Stand Strong Fencing
Nashville, Tennessee 37201, Unites Ѕtates
16154311511
Trusted fence installation experts (go.bubbl.us)
Wonderful, what a blog it is! This web site presents useful information to us,
keep it up.
You actually make it seem so easy with your presentation but I find this topic to be
really something which I think I would never understand.
It seems too complex and extremely broad for me. I’m looking forward for your next
post, I will try to get the hang of it!
[JepangBetplatform game online terbaik] memang menjadi pilihan tepat buat saya.
[Website-nyauser-friendly] dan [pelayanan customer servicenya{cepat|profesional|memang ramah]
banget! [Bisa main dengan tenang tanpa khawatir|Aman dan terpercaya banget]!
Baru tau [JepangBet{situs game online|platform game terpercaya}]!
[Koleksi game-nya{lumayan lengkap|sangat beragam|gak kalah sama situs lain}] dan [bonus-nya{lumayan besar|menarik banget|terjangkau].
[Sangat cocok buat yang suka main game online{dan cari penghasilan tambahan|dan cari hiburan seru}]!
[JepangBet{situs game online terpercaya|situs judi online terbaik}] udah jadi [situs favoritku{untuk bermain game online|untuk mengisi waktu luang}]!
[Mudah menang{dan withdrawnya cepat}]. [Rekomendasi buat yang mau coba main game online{dan cari uang tambahan}]!
[JepangBet{situs game online terpercaya|platform game online terbaik|judi online aman}] bikin saya makin yakin buat main game
online. [Website-nya{user-friendly|mudah diakses|terjamin keamanannya}] dan [pelayanan customer servicenya{cepat|profesional|memang ramah|tanggap banget}]!
Udah lama cari situs game online yang terpercaya, akhirnya nemu [JepangBet]!
[Sistem keamanan-nya{kuat|terjamin|gak perlu khawatir}] dan [proses withdraw-nya{cepat|mudah}]!
Gak perlu ragu lagi buat main di [JepangBet{situs game online terpercaya|platform game online terbaik}]!
[Lisensi dan izinnya{legal|terdaftar resmi}] jadi [saya yakin aman dan terpercaya]!.
[JepangBet{situs game online|platform game terpercaya}] punya [koleksi game{lumayan lengkap|sangat beragam|gak kalah sama situs lain}]!
[Bonus-nya{lumayan besar|menarik banget|terjangkau}] dan [mudah diklaim]!
[JepangBet{situs game online terpercaya|platform game online terbaik}] bikin saya makin semangat
main game! [Bonus harian{dan mingguan|dan turnamen}] bikin [saya makin excited]!
[JepangBet{situs game online terpercaya}] benar-benar
[situs game online yang recommended]! [Game-nya seru, bonusnya menarik, dan {customer servicenya ramah|pelayanannya cepat}]!
[JepangBet{situs game online terpercaya}] udah jadi [situs favoritku{untuk bermain game online|untuk mengisi waktu luang}]!
[Mudah menang dan {withdrawnya cepat|proses withdrawnya mudah}]!
[JepangBet{situs game online terpercaya}] bikin saya [makin semangat main game online]!
[Game-nya seru, bonusnya menarik, dan {customer servicenya ramah|pelayanannya cepat}]!
[JepangBet{situs game online terpercaya}] [situs game online yang recommended]!
[Game-nya seru, bonusnya menarik, dan {saya udah coba main di sini dan gak kecewa|saya puas banget}]!
Ветошь: секреты производства и применения
ветошь хб