Skip to main content

Posts

Showing posts from 2012

Cleaning Parsing and Extracting with Tika

I normally used htmlCleaner or jsoup for parsing and cleaning. Also have tried Boilerpipe, but wasn't satisfied for various reasons. These all have their pros and cons and one of thesse can be choosen as per need of application. Got a chance to try Tika by apache and it's very good at parsing and cleaning hmtml, scripts. Apart from this it can also format many other formats like PDF, DOC, ODF, etc. Also solr tika for all these tasks, so being in sync with solr will be more appropriate if something is being done on top of the solr. know more about tika parser :  http://tika.apache.org/   [Logs for myself  :) ]

Dairy and Milk related schemes

http://nabard.org/deds_brochure.asp  : DAIRY ENTREPRENEURSHIP DEVELOPMENT SCHEME - SPONSORED BY GoI   Dairy and Poultry Venture capital fund launched in 2005-06 was segregated into Dairy and Poultry Venture Capital Funds during the year 2009-10. The mode of implementation of Dairy Venture Capital Fund is changed from interest free loan to capital subsidy and a revised scheme Dairy Entrepreneurship Development Scheme (DEDS) has come into effect from 1 September 2010   Objectives To promote setting up of modern dairy farms for production of clean milk To encourage heifer calf rearing thereby conserve good breeding stock To bring structural changes in the unorganized sector so that initial processing of milk can be taken up at the village level itself. To bring about upgradation of quality and traditional technology to handle milk on a commercial scale To generate self employment and provide infrastructure mainly for unorganized sector..   Activities covered and indic

प्रत्येक क्षणाचा पिक्झेल न पिक्झेल

कितीही नको म्हंटले तरी आठवणी पुन्हा समोर येतातच. डोक्यात चाललेलं सगळ बाहेर काढून पुन्हा आपला प्रस्थ आत अस काही मांडतात की आठवणींची आणि डोक्यातल्या जागेची इंटरसेक्शन घेतली तरी ती त्या दोघांच्याही युनियन इतकीच भरते आणि ते होवून ही अजून मन सुद्धा गच्च भरलेले असते. आजही तसेच झाले. अगदीच अचानक, त्याला आज तिची आठवण झाली. मागे पुढे काहीच संदर्भ नसतांना. दिवसभर बड-बड करून थकलेला तो सायंकाळी अचानकच शांत झाला. लोकांच्या गर्दीतून दूर जाऊन बसला, विचार करत. जोरात निश्वास सोडत. पुन्हा विचार करत. अस खूप वेळ चालू होत. चेहरा अगदीच मलूल. कुणी बघितल तर, रडेल हा अचानक, असच म्हणेल. त्याच्या डोळ्या समोरून ती सगळी सरकत होती अगदी चित्रपटाच्या फिल्म सारखी एका एका क्षणाने. ती हसतांना. ती रुसतांना. ती लाडात आलेली असतांना. सगळे हाव-भाव तिचे अगदी स्पष्ट आणि मोठ्या रेसोल्युशन मध्ये समोर येत होते. तो अगदी विसरलाच होता तिला. पण मग हे काय अचानक? कारण एखादा माणूस कुणाला अगदीच विसरला असेल तर आठवून आठवून ही त्या व्यक्तीचा चेहरा त्याला निट आठवत नसतो. पण मग इथे तर अगदी डोळ्यासमोर प्रत्येक क्षणाचा पिक्झेल न पिक्झेल स्पष्ट

White space character not being removed by java trim()

Java String's trim() functions trims only white spaces defined in ASCII. Others are not trimmed. and when printing those characters in the string you will find value as 160.           char c = str.charAt(0);         int val = (int) c;         System.out.println("The char:"+val+":"); output:  The char:160: Unicode equivalent:  \\u00A0 So we will need a unicode compatible method for trimming this. Some indicate a way to do it by using some google library : http://code.google.com/p/guava-libraries/ Custom Solution: boolean isWhiteSpace (String word) {    if(word.replaceAll("\\u00A0","").length() < 1){             return true;         } } And then you can decide on removing it from processing. 

Publishing business basics

Basic Steps: 1. Decide name for the company 2. Register the company with ministry - you will need an attorney (Lawyer for that) 3. Register with Registrar of News Papers in India if it's a magazine/News paper.  4. Study the relevant acts in general or get them known from the lawyer 5. Start publishing Following are details regarding the same (not that well written) : ----- Some starts and books; * Start Your Own Self-Publishing Business (Entrepreneur Magazine's Start Up) by Entrepreneur Press  * How To Start And Run A Small Book Publishing Company: A Small Business Guide To Self-Publishing And Independent Publishing by Peter I. Hupalo  * Art & Science Of Book Publishing by Herbert S., Jr. Bailey  * This Business of Books: A Complete Overview of the Industry from Concept Through Sales by Claudia Suzanne Raja Rammohun Roy National Agency for ISBN West Block-I, Wing-6, 2nd Floor, Sector -I, R.K. Puram, New Delhi-110066 Some new things and the initiatives in the

where to start AI?

Some interesting and must read starting points for Artificial Intelligence. http://www.aaai.org/home.html http://aitopics.net/NewsFinderInfo And specifically Natural Language Processing (NLP) :  http://aitopics.net/NaturalLanguage Machine Learning:  http://aitopics.net/MachineLearning

Time and Temperature across the world

    Rome ( Italy ) Berlin (Allemagne) Moscow  ( Russia ) Madrid  (Espagne)   Bruxelles National ( Belgium )   San José ( Guatemala )     Acapulco  (Mexique)   Rio ( Br a zil )     Hong Kong  (Chine)   Tokyo  ( Japan )   Ovda ( Israel )   Makokou ( Gabon )   Puerto Plata (Rp. Dominican)   Nassau  ( Bahamas )   Port-au-Prince ( Haiti )   Athens  ( Greece )   Zurich (Suisse)   Dublin ( Ireland )   Bayreuth  ( L e ban on  )   Budapest  (Hongrie)   Belgrade ( Yugoslavia )   Kuwait  ( Kuwait )   New Delhi  ( India )   Paris ( France )   Quebec ( Quebec , Canada )   Ottawa ( Ontario , Canada )   Le Cairo ( Egypt )   Casablanca ( Morocco )   Los Angeles ( California , USA )   Tehran  ( Iran )   Addis Ababa-Bole ( Ethiopia )