Handling the unrecognizable characters in Stanford Parser

Handling the unrecognizable characters in Stanford Parser/Tokenizer for warnings like :
WARNING: Unrecognizable .... U+FFD ... for some character.
You may find solution here :
http://nlp.stanford.edu/software/tokenizer.shtml
i.e. untokenizable: What to do with untokenizable characters (ones not known to the tokenizer). Six options combining whether to log a warning for none, the first, or all, and whether to delete them or to include them as single character tokens in the output: noneDelete, firstDelete, allDelete, noneKeep, firstKeep, allKeep. The default is "firstDelete".

Comments

NVDIA AI India

NVDIA AI some Insights after digesting the event with some Diwali Delicacies! I had good time at the NVIDIA AI Summit held at the Jio World Center in Mumbai. It felt like every company working on artificial intelligence in India was present, either as an exhibitor or an attendee. Some sessions were so packed that people were standing, and even then, many more were trying to get in. Much of the discussion revolved around Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). These are undoubtedly central topics in AI today, drawing significant attention. I’m a huge admirer of RAG, but I still wouldn’t go as far as to say that “LLM (+RAG) is AI.” Although no one at the conference explicitly said this, it felt implied over the three days of sessions. I may be wrong, but I sensed a push for these technologies. Not just from NVIDIA, but from any hardware supplier, there’s an incentive to promote anything that drives demand for their solutions. NVIDIA’s GPUs are a backbo...

न आओ छोड़ के .. . घर अगर वो ठीक है

उम्र के साथ कविता भी बदलती है. ये २०११ की कविता, मेरे बढ़ने के साथ ये कविता भी बढ़ी है . २०११ की ये पुरानी कविता यहाँ है https://pimpalepatil.blogspot.com/2011/09/blog-post.html?m=1 -- जिंदगी थी जहाँ, है वही आज भी. ढूंड ले हम कहाँ, फिर आपकी सादगी? हर घडी आप हो, याद है हर कही. जिंदगी अब नहीं मिल रही आजसे, कल ही में खो गयी उस रात के बाद से. हम अभी है वहां आप छोड़ निकल गए. सांस भी है वही आप जो दे गए. है अभी इंतजार, फिर उसी शाम का, आपके प्यार का, और नयी बहार का. आप आ-ओ लौट के, हम अभी है वही, आप आ-ओ लौट के, हम अभी है वही. है वही पेड़ भी, पर नही पंछीया छांव है, फूल है, पर नही तितलियां। है नदी आज भी ठीक उसी ही जगह बारिंशो में ले आती है घर कई बे_ इम्तिहान लोग है, भीड़ है, सिर झुकाए घूमते कोई शहंशाह इन्हे कर गुलाम चल गया ये भी है ठीक पर उस भीड़ में तुम नहीं हम यहां है फसे बस अकेले तुम नहीं बस अकेले तुम नहीं। न आओ छोड़ के घर अगर वो ठीक है मैं लाख कहूं की तुम्हें तुम आओ लोट के, तुम आओ लोटके, की तुम आओ लोटके! ...

ती अशीच त्याला भेटली, अगदी अचानक आलेल्या पावसासारखी

खूप पावसाळे गेले. पण अगदी अलीकडे पर्यंत त्याला पाऊस कधीच आवडलेला नव्हता. पाऊस म्हंटल की त्याला भीती वाटायची, नसती कट कट वाटायची. भिजणे तर त्याला कधीच नको वाटायचे, उगाच सर्दी ला आमंत्रण. पण मागच्या पवसाळ्यात ढग असे आले आणि पाऊस इतका पडला की तो पावसाच्या प्रेमातच पडला. पाऊस आधीही पडायचा, कदाचित असाच, इतकाच किंवा जास्त ही. पण या वेळेस मात्र पाऊस एकटा आला नाही, त्याच्या आयुष्यात सोबत घेऊन आला तिच्या केसांचा ओला सुगंध. ती अशीच त्याला भेटली, अगदी अचानक आलेल्या मुंबईतल्या पावसासारखी, चिंब भिजलेली. त्याला ती आवडली आणि म्हणून पाऊस ही. ती त्याला पुन्हा भेटली, न ठरवून, अशीच पुन्हा एकदा अचानक पावसाराखीच. ते बोलले, थांबले, विसावले. ती समोर आली तेंव्हा पाऊस नव्हताच, ती अजून थोडी जवळ आली तेंव्हा ही पाऊस नव्हता. तो तिच्याकडे सारखा पहातच होता, पण नजर चोरून. आयुष्यात पहिल्यांदा त्याने पावसाकडे येण्याची विनवणी केली. आशेच्या नजरेने त्याने ढगांकडे पाहिले. ढग होते, पण अजून पाऊस मात्र नव्हता. तरी ही आजूबाजूंच्या झाडात, डोंगरात, फुलात आणि पाखरात ओलावा होताच. त्याच्या विनवनीने अखेर तो आला. रिम-झिम, रिम-झिम...

The Personal Blog

Search This Blog