If data is not in xls or csv files it can be in html or pdf. In these cases you will need to scrape them in order to have them in an excel file for doing the analyse. Maybe after doing this, data is a bit messy and you will need to clean them.
But don’t worry because there are many different tools that let you do this in a very simple way. You can find below some resources that go from simple to complex.
1.- Clean the data
Open Refine: OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. It has a programming language called General Refine Expressions Language (GREL) that accept regular expressions (the black magic). With general expressions you may detect characters in text and delete them, for example.
2.- Extract the data
Import.io: It will let you extract large amounts of data from a web page into an Excel spreadsheet.
Tabula: An open software that easily allows scraping for pdf files.
importHTML and importXML: these are two great functions of Google Spreadsheets that allow you to get elements of the DOM (HTML) and import them into a Spreadsheet.